2008-02-16 00:42:20

by kelk1

[permalink] [raw]
Subject: Kernel oops with bluetooth usb dongle

Hi,

Since the rc's of 2.6.24, my machine crashes when I try to use the USB dongle.
No other activity seems to create a crash. The system is stable with 2.6.23.1.
It is a Dell Dimension 8400 updated daily to mandriva cooker. I am not
subscribed
to this list, so please CC me if there is any more information I can provide.

Here is the Oops with 2.6.25-rc1:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap nfsd auth_rpcgss exportfs nfs lockd nfs_acl
sunrpc af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat dm_mod piix
ide_core fuse snd_pcm_oss snd_mixer_oss hci_usb snd_intel8x0 snd_ac97_codec
ac97_bus thermal button snd_pcm i2c_i801 snd_timer parport_pc processor snd
parport pcspkr i2c_core evdev dcdbas tg3 soundcore rtc_cmos bluetooth
snd_page_alloc iTCO_wdt sr_mod iTCO_vendor_support sg ata_piix ahci libata
sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ssb pcmcia pcmcia_core ehci_hcd
usbcore [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.25-rc1 #1)
EIP: 0060:[<c0131f3e>] EFLAGS: 00010097 CPU: 0
EIP is at get_next_timer_interrupt+0xfe/0x210
EAX: 00000000 EBX: 00000000 ECX: c047e7dc EDX: 00000000
ESI: 00000026 EDI: c047e6ac EBP: c03f5f60 ESP: c03f5f28
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c03f4000 task=c03c9300 task.ti=c03f4000)
Stack: 0005a600 0005a55b c047dea0 00000000 00000001 00000026 000005a6 c047e6ac
c047e8ac c047eaac c047ecac c1808340 ffc484c0 0005a55b c03f5fa4 c014627c
ffd3c700 c03c9300 c03c947c c180b420 ffc4cdde 0000009b ffc484c0 0000009b
Call Trace:
[<c014627c>] ? tick_nohz_stop_sched_tick+0x13c/0x350
[<c014658c>] ? tick_nohz_restart_sched_tick+0xfc/0x140
[<c0103a20>] ? default_idle+0x0/0xa0
[<c0103854>] ? cpu_idle+0x34/0x110
[<c0322519>] ? rest_init+0x49/0x50
=======================
Code: e0 89 c6 89 45 dc 8d b4 26 00 00 00 00 8b 04 f7 8b 10 0f 18 02 90 8d 0c f7
39 c8 0f 84 83 00 00 00 8b 40 08 39 d8 0f 48 d8 89 d0 <8b> 12 0f 18 02 90 39 c1
75 ec c7 45 d4 01 00 00 00 8b 7d dc 85
EIP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210 SS:ESP 0068:c03f5f28
---[ end trace 5fb484ad8037e593 ]---
Kernel panic - not syncing: Attempted to kill the idle task!

$ lsub -vvv
00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
Subsystem: Dell Dimension 8400
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin C routed to IRQ 18
Region 4: I/O ports at ff40 [size=32]
Kernel driver in use: uhci_hcd
Kernel modules: uhci-hcd

# ./ver_linux
Linux macine.domain.com 2.6.25-rc1 #1 SMP Fri Feb 15 14:39:54 PST 2008 i686
Intel(R) Pentium(R) 4 CPU 3.20GHz GNU/Linux

Gnu C 4.2.2
Gnu make 3.81
binutils 2.18.50.0.3.20071102
util-linux 2.13.1
mount 2.13.1
module-init-tools 3.3-pre11
e2fsprogs 1.40.6
PPP 2.4.4
Linux C Library 2.7
Dynamic linker (ldd) 2.7
Procps 3.2.7
Net-tools 1.60
Kbd 1.12
Sh-utils 6.10
udev 118
Modules Loaded nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc
af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat dm_mod piix ide_core
fuse snd_pcm_oss snd_mixer_oss hci_usb snd_intel8x0 snd_ac97_codec thermal
ac97_bus i2c_i801 snd_pcm processor button i2c_core parport_pc snd_timer pcspkr
iTCO_wdt sr_mod rtc_cmos parport snd iTCO_vendor_support soundcore bluetooth
dcdbas tg3 evdev snd_page_alloc sg ata_piix ahci libata sd_mod scsi_mod ext3 jbd
uhci_hcd ohci_hcd ssb pcmcia pcmcia_core ehci_hcd usbcore

Bus 003 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle
(HCI mode)
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 224 Wireless
bDeviceSubClass 1 Radio Frequency
bDeviceProtocol 1 Bluetooth
bMaxPacketSize0 64
idVendor 0x0a12 Cambridge Silicon Radio, Ltd
idProduct 0x0001 Bluetooth Dongle (HCI mode)
bcdDevice 1.84
iManufacturer 0
iProduct 0
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 193
bNumInterfaces 3
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0010 1x 16 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0000 1x 0 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0000 1x 0 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 1
bNumEndpoints 2
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0009 1x 9 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0009 1x 9 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 2
bNumEndpoints 2
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0011 1x 17 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0011 1x 17 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 3
bNumEndpoints 2
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0019 1x 25 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0019 1x 25 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 4
bNumEndpoints 2
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0021 1x 33 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0021 1x 33 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 5
bNumEndpoints 2
bInterfaceClass 224 Wireless
bInterfaceSubClass 1 Radio Frequency
bInterfaceProtocol 1 Bluetooth
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0031 1x 49 bytes
bInterval 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0031 1x 49 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 0
bInterfaceClass 254 Application Specific Interface
bInterfaceSubClass 1 Device Firmware Update
bInterfaceProtocol 0
iInterface 0
** UNRECOGNIZED: 07 21 07 88 13 ff 03
Device Status: 0x0000
(Bus Powered)

Bus 003 Device 001: ID 1d6b:0001
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x1d6b
idProduct 0x0001
bcdDevice 2.06
iManufacturer 3 Linux 2.6.25-rc1 uhci_hcd
iProduct 2 UHCI Host Controller
iSerial 1 0000:00:1d.1
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 25
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 9 Hub
bInterfaceSubClass 0 Unused
bInterfaceProtocol 0 Full speed (or root) hub
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval 255
Hub Descriptor:
bLength 9
bDescriptorType 41
nNbrPorts 2
wHubCharacteristic 0x000a
No power switching (usb 1.0)
Per-port overcurrent protection
bPwrOn2PwrGood 1 * 2 milli seconds
bHubContrCurrent 0 milli Ampere
DeviceRemovable 0x00
PortPwrCtrlMask 0xff
Hub Port Status:
Port 1: 0000.0103 power enable connect
Port 2: 0000.0100 power
Device Status: 0x0003
Self Powered
Remote Wakeup Enabled

Thank you for your time.
--
kk1


2008-02-16 11:06:11

by Jiri Kosina

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

[ Ingo and Thomas added to CC, as this is apparently nohz stuff ]

Quel, does the problem go away when you boot with nohz=off?

Original message left below for reference.

On Sat, 16 Feb 2008, Quel Qun wrote:

> Hi,
>
> Since the rc's of 2.6.24, my machine crashes when I try to use the USB dongle.
> No other activity seems to create a crash. The system is stable with 2.6.23.1.
> It is a Dell Dimension 8400 updated daily to mandriva cooker. I am not
> subscribed
> to this list, so please CC me if there is any more information I can provide.
>
> Here is the Oops with 2.6.25-rc1:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000
> IP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210
> *pde = 00000000
> Oops: 0000 [#1] SMP
> Modules linked in: hidp rfcomm l2cap nfsd auth_rpcgss exportfs nfs lockd nfs_acl
> sunrpc af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat dm_mod piix
> ide_core fuse snd_pcm_oss snd_mixer_oss hci_usb snd_intel8x0 snd_ac97_codec
> ac97_bus thermal button snd_pcm i2c_i801 snd_timer parport_pc processor snd
> parport pcspkr i2c_core evdev dcdbas tg3 soundcore rtc_cmos bluetooth
> snd_page_alloc iTCO_wdt sr_mod iTCO_vendor_support sg ata_piix ahci libata
> sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ssb pcmcia pcmcia_core ehci_hcd
> usbcore [last unloaded: scsi_wait_scan]
>
> Pid: 0, comm: swapper Not tainted (2.6.25-rc1 #1)
> EIP: 0060:[<c0131f3e>] EFLAGS: 00010097 CPU: 0
> EIP is at get_next_timer_interrupt+0xfe/0x210
> EAX: 00000000 EBX: 00000000 ECX: c047e7dc EDX: 00000000
> ESI: 00000026 EDI: c047e6ac EBP: c03f5f60 ESP: c03f5f28
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c03f4000 task=c03c9300 task.ti=c03f4000)
> Stack: 0005a600 0005a55b c047dea0 00000000 00000001 00000026 000005a6 c047e6ac
> c047e8ac c047eaac c047ecac c1808340 ffc484c0 0005a55b c03f5fa4 c014627c
> ffd3c700 c03c9300 c03c947c c180b420 ffc4cdde 0000009b ffc484c0 0000009b
> Call Trace:
> [<c014627c>] ? tick_nohz_stop_sched_tick+0x13c/0x350
> [<c014658c>] ? tick_nohz_restart_sched_tick+0xfc/0x140
> [<c0103a20>] ? default_idle+0x0/0xa0
> [<c0103854>] ? cpu_idle+0x34/0x110
> [<c0322519>] ? rest_init+0x49/0x50
> =======================
> Code: e0 89 c6 89 45 dc 8d b4 26 00 00 00 00 8b 04 f7 8b 10 0f 18 02 90 8d 0c f7
> 39 c8 0f 84 83 00 00 00 8b 40 08 39 d8 0f 48 d8 89 d0 <8b> 12 0f 18 02 90 39 c1
> 75 ec c7 45 d4 01 00 00 00 8b 7d dc 85
> EIP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210 SS:ESP 0068:c03f5f28
> ---[ end trace 5fb484ad8037e593 ]---
> Kernel panic - not syncing: Attempted to kill the idle task!
>
> $ lsub -vvv
> 00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
> Subsystem: Dell Dimension 8400
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin C routed to IRQ 18
> Region 4: I/O ports at ff40 [size=32]
> Kernel driver in use: uhci_hcd
> Kernel modules: uhci-hcd
>
> # ./ver_linux
> Linux macine.domain.com 2.6.25-rc1 #1 SMP Fri Feb 15 14:39:54 PST 2008 i686
> Intel(R) Pentium(R) 4 CPU 3.20GHz GNU/Linux
>
> Gnu C 4.2.2
> Gnu make 3.81
> binutils 2.18.50.0.3.20071102
> util-linux 2.13.1
> mount 2.13.1
> module-init-tools 3.3-pre11
> e2fsprogs 1.40.6
> PPP 2.4.4
> Linux C Library 2.7
> Dynamic linker (ldd) 2.7
> Procps 3.2.7
> Net-tools 1.60
> Kbd 1.12
> Sh-utils 6.10
> udev 118
> Modules Loaded nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc
> af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat dm_mod piix ide_core
> fuse snd_pcm_oss snd_mixer_oss hci_usb snd_intel8x0 snd_ac97_codec thermal
> ac97_bus i2c_i801 snd_pcm processor button i2c_core parport_pc snd_timer pcspkr
> iTCO_wdt sr_mod rtc_cmos parport snd iTCO_vendor_support soundcore bluetooth
> dcdbas tg3 evdev snd_page_alloc sg ata_piix ahci libata sd_mod scsi_mod ext3 jbd
> uhci_hcd ohci_hcd ssb pcmcia pcmcia_core ehci_hcd usbcore
>
> Bus 003 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle
> (HCI mode)
> Device Descriptor:
> bLength 18
> bDescriptorType 1
> bcdUSB 1.10
> bDeviceClass 224 Wireless
> bDeviceSubClass 1 Radio Frequency
> bDeviceProtocol 1 Bluetooth
> bMaxPacketSize0 64
> idVendor 0x0a12 Cambridge Silicon Radio, Ltd
> idProduct 0x0001 Bluetooth Dongle (HCI mode)
> bcdDevice 1.84
> iManufacturer 0
> iProduct 0
> iSerial 0
> bNumConfigurations 1
> Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength 193
> bNumInterfaces 3
> bConfigurationValue 1
> iConfiguration 0
> bmAttributes 0xc0
> Self Powered
> MaxPower 0mA
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 0
> bAlternateSetting 0
> bNumEndpoints 3
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x81 EP 1 IN
> bmAttributes 3
> Transfer Type Interrupt
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0010 1x 16 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x02 EP 2 OUT
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x82 EP 2 IN
> bmAttributes 2
> Transfer Type Bulk
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0040 1x 64 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 0
> bNumEndpoints 2
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0000 1x 0 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0000 1x 0 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 1
> bNumEndpoints 2
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0009 1x 9 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0009 1x 9 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 2
> bNumEndpoints 2
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0011 1x 17 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0011 1x 17 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 3
> bNumEndpoints 2
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0019 1x 25 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0019 1x 25 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 4
> bNumEndpoints 2
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0021 1x 33 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0021 1x 33 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 1
> bAlternateSetting 5
> bNumEndpoints 2
> bInterfaceClass 224 Wireless
> bInterfaceSubClass 1 Radio Frequency
> bInterfaceProtocol 1 Bluetooth
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x03 EP 3 OUT
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0031 1x 49 bytes
> bInterval 1
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x83 EP 3 IN
> bmAttributes 1
> Transfer Type Isochronous
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0031 1x 49 bytes
> bInterval 1
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 2
> bAlternateSetting 0
> bNumEndpoints 0
> bInterfaceClass 254 Application Specific Interface
> bInterfaceSubClass 1 Device Firmware Update
> bInterfaceProtocol 0
> iInterface 0
> ** UNRECOGNIZED: 07 21 07 88 13 ff 03
> Device Status: 0x0000
> (Bus Powered)
>
> Bus 003 Device 001: ID 1d6b:0001
> Device Descriptor:
> bLength 18
> bDescriptorType 1
> bcdUSB 1.10
> bDeviceClass 9 Hub
> bDeviceSubClass 0 Unused
> bDeviceProtocol 0 Full speed (or root) hub
> bMaxPacketSize0 64
> idVendor 0x1d6b
> idProduct 0x0001
> bcdDevice 2.06
> iManufacturer 3 Linux 2.6.25-rc1 uhci_hcd
> iProduct 2 UHCI Host Controller
> iSerial 1 0000:00:1d.1
> bNumConfigurations 1
> Configuration Descriptor:
> bLength 9
> bDescriptorType 2
> wTotalLength 25
> bNumInterfaces 1
> bConfigurationValue 1
> iConfiguration 0
> bmAttributes 0xe0
> Self Powered
> Remote Wakeup
> MaxPower 0mA
> Interface Descriptor:
> bLength 9
> bDescriptorType 4
> bInterfaceNumber 0
> bAlternateSetting 0
> bNumEndpoints 1
> bInterfaceClass 9 Hub
> bInterfaceSubClass 0 Unused
> bInterfaceProtocol 0 Full speed (or root) hub
> iInterface 0
> Endpoint Descriptor:
> bLength 7
> bDescriptorType 5
> bEndpointAddress 0x81 EP 1 IN
> bmAttributes 3
> Transfer Type Interrupt
> Synch Type None
> Usage Type Data
> wMaxPacketSize 0x0002 1x 2 bytes
> bInterval 255
> Hub Descriptor:
> bLength 9
> bDescriptorType 41
> nNbrPorts 2
> wHubCharacteristic 0x000a
> No power switching (usb 1.0)
> Per-port overcurrent protection
> bPwrOn2PwrGood 1 * 2 milli seconds
> bHubContrCurrent 0 milli Ampere
> DeviceRemovable 0x00
> PortPwrCtrlMask 0xff
> Hub Port Status:
> Port 1: 0000.0103 power enable connect
> Port 2: 0000.0100 power
> Device Status: 0x0003
> Self Powered
> Remote Wakeup Enabled
>
> Thank you for your time.
> --
> kk1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Jiri Kosina
SUSE Labs

2008-02-16 11:51:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Sat, 16 Feb 2008, Jiri Kosina wrote:
> [ Ingo and Thomas added to CC, as this is apparently nohz stuff ]

Well, it explodes there :) I can not exactly decode the source line,
but it's either list corruption or something is fiddling with an
enqueued timer.

> Quel, does the problem go away when you boot with nohz=off?

Quel, can you please compile the kernel with CONFIG_DEBUG_INFO=y ?

Then after a crash, pick the address:

> > IP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210
----------^^^^^^^^
and run it through:

# addr2line -e vmlinux c0131f3e

Please also enable CONFIG_DEBUG_LIST=y, which should catch the place
where a list corruption happens.

Thanks,

tglx

2008-02-16 22:50:06

by kelk1

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

-------------- Original message ----------------------
From: Thomas Gleixner <[email protected]>
> On Sat, 16 Feb 2008, Jiri Kosina wrote:
> > [ Ingo and Thomas added to CC, as this is apparently nohz stuff ]
>
> Well, it explodes there :) I can not exactly decode the source line,
> but it's either list corruption or something is fiddling with an
> enqueued timer.
>
> > Quel, does the problem go away when you boot with nohz=off?
>
> Quel, can you please compile the kernel with CONFIG_DEBUG_INFO=y ?
>
> Then after a crash, pick the address:
>
> > > IP: [<c0131f3e>] get_next_timer_interrupt+0xfe/0x210
> ----------^^^^^^^^
> and run it through:
>
> # addr2line -e vmlinux c0131f3e
>
> Please also enable CONFIG_DEBUG_LIST=y, which should catch the place
> where a list corruption happens.
>

Hi,

Thank you for the hand holding. I must admit I do not anything about kernel debugging.

With or without nohz=off, the crashes are very similar. It looks like it fails to execute list_add_tail(&timer->entry, vec), line 294 of kernel/timer.c.

I hope this helps, let me know if you need more info or want me to try anything else.
--
Eric

nohz=off

list_add corruption. prev->next should be next (c047e704), but was 00000000. (prev=ddcca118).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat dm_mod piix ide_core fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm thermal snd_timer processor hci_usb button snd dcdbas i2c_i801 pcspkr soundcore evdev parport_pc i2c_core rtc_cmos sr_mod snd_page_alloc tg3 parport sg bluetooth iTCO_wdt iTCO_vendor_support ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ssb pcmcia pcmcia_core ehci_hcd usbcore [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.25-rc1 #1)
EIP: 0060:[<c020bc4a>] EFLAGS: 00010086 CPU: 1
EIP is at __list_add+0x5a/0x60
EAX: 00000061 EBX: c047f380 ECX: 00000001 EDX: 00000092
ESI: c047f380 EDI: c047dea0 EBP: f7847e14 ESP: f7847e00
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=f7846000 task=f783ec20 task.ti=f7846000)
Stack: c03990d8 c047e704 00000000 ddcca118 000e09c1 f7847e24 c0131776 00000286
c047f380 f7847e38 c01323a4 c03ca0c0 03ba7837 c03ccf88 f7847ea4 c01434ca
00000286 f7847e64 c01320cc 000e0aba 114e8f6a 00000000 00140d5e 00000000
Call Trace:
[<c0131776>] ? internal_add_timer+0x36/0xb0
[<c01323a4>] ? add_timer_on+0x54/0x70
[<c01434ca>] ? clocksource_watchdog+0x22a/0x240
[<c01320cc>] ? __mod_timer+0x9c/0xb0
[<c01319fd>] ? run_timer_softirq+0x15d/0x1d0
[<c01432a0>] ? clocksource_watchdog+0x0/0x240
[<c01432a0>] ? clocksource_watchdog+0x0/0x240
[<c012d912>] ? __do_softirq+0x82/0x100
[<c012d9e8>] ? do_softirq+0x58/0x60
[<c012dd8c>] ? irq_exit+0x5c/0x70
[<c0115b38>] ? smp_apic_timer_interrupt+0x58/0x90
[<c0105a3c>] ? apic_timer_interrupt+0x28/0x30
[<c013007b>] ? sysctl_perm+0x3b/0x80
[<c011ae25>] ? native_safe_halt+0x5/0x10
[<c0103a7f>] ? default_idle+0x5f/0xa0
[<c0103a20>] ? default_idle+0x0/0xa0
[<c010388f>] ? cpu_idle+0x6f/0x110
[<c032a7e5>] ? start_secondary+0x195/0x1a0
=======================
Code: 44 24 08 c7 04 24 88 90 39 c0 e8 b2 d7 f1 ff 0f 0b eb fe 89 54 24 08 89 4c 24 04 89 44 24 0c c7 04 24 d8 90 39 c0 e8 96 d7 f1 ff <0f> 0b eb fe 66 90 8b 0a 55 89 e5 e8 96 ff ff ff 5d c3 90 90 90
EIP: [<c020bc4a>] __list_add+0x5a/0x60 SS:ESP 0068:f7847e00
Kernel panic - not syncing: Fatal exception in interrupt

Parameter not specified:

list_add corruption. prev->next should be next (c047e77c), but was 00000001. (prev=f644bd98).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat dm_mod piix ide_core fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 hci_usb snd_ac97_codec ac97_bus parport_pc snd_pcm snd_timer thermal parport button snd processor i2c_i801 sr_mod i2c_core pcspkr rtc_cmos soundcore snd_page_alloc evdev tg3 dcdbas bluetooth iTCO_wdt iTCO_vendor_support sg ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ssb pcmcia pcmcia_core ehci_hcd usbcore [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.25-rc1 #1)
EIP: 0060:[<c020bc4a>] EFLAGS: 00010086 CPU: 0
EIP is at __list_add+0x5a/0x60
EAX: 00000061 EBX: c03e133c ECX: 00000001 EDX: 00000086
ESI: c03e133c EDI: c047dea0 EBP: c03f5e6c ESP: c03f5e58
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c03f4000 task=c03c8300 task.ti=c03f4000)
Stack: c03990d8 c047e77c 00000001 f644bd98 00019259 c03f5e7c c0131776 c03e133c
c047dea0 c03f5e9c c01320c2 00019a28 00000000 00000286 c03e133c 00019a28
c03e12a0 c03f5eac c0132296 00000000 f7133ba4 c03f5ec4 c02beb6c 00019258
Call Trace:
[<c0131776>] ? internal_add_timer+0x36/0xb0
[<c01320c2>] ? __mod_timer+0x92/0xb0
[<c0132296>] ? mod_timer+0x26/0x40
[<c02beb6c>] ? neigh_periodic_timer+0x12c/0x160
[<c01319fd>] ? run_timer_softirq+0x15d/0x1d0
[<c02bea40>] ? neigh_periodic_timer+0x0/0x160
[<c02bea40>] ? neigh_periodic_timer+0x0/0x160
[<c012d912>] ? __do_softirq+0x82/0x100
[<c012d9e8>] ? do_softirq+0x58/0x60
[<c012dd8c>] ? irq_exit+0x5c/0x70
[<c0115b38>] ? smp_apic_timer_interrupt+0x58/0x90
[<c0105a3c>] ? apic_timer_interrupt+0x28/0x30
[<c013007b>] ? sysctl_perm+0x3b/0x80
[<c011ae25>] ? native_safe_halt+0x5/0x10
[<c0103a7f>] ? default_idle+0x5f/0xa0
[<c0103a20>] ? default_idle+0x0/0xa0
[<c010388f>] ? cpu_idle+0x6f/0x110
[<c0320809>] ? rest_init+0x49/0x50
=======================
Code: 44 24 08 c7 04 24 88 90 39 c0 e8 b2 d7 f1 ff 0f 0b eb fe 89 54 24 08 89 4c 24 04 89 44 24 0c c7 04 24 d8 90 39 c0 e8 96 d7 f1 ff <0f> 0b eb fe 66 90 8b 0a 55 89 e5 e8 96 ff ff ff 5d c3 90 90 90
EIP: [<c020bc4a>] __list_add+0x5a/0x60 SS:ESP 0068:c03f5e58
Kernel panic - not syncing: Fatal exception in interrupt

Here are all the addresses from the backtrace:

c0131776: kernel/timer.c:295
c01323a4: kernel/timer.c:454
c01434ca: kernel/time/clocksource.c:150
c01320cc: kernel/timer.c:433
c01319fd: kernel/timer.c:657
c01432a0: kernel/time/clocksource.c:98
c01432a0: kernel/time/clocksource.c:98
c012d912: include/linux/rcuclassic.h:112
c012d9e8: kernel/softirq.c:271
c012dd8c: kernel/softirq.c:314
c0115b38: include/asm/irq_regs_32.h:24
c0105a3c: include/asm-x86/mach-default/entry_arch.h:26
c013007b: kernel/sysctl.c:1520
c011ae25: include/asm/irqflags.h:49
c0103a7f: include/asm/paravirt.h:638
c0103a20: arch/x86/kernel/process_32.c:106
c010388f: arch/x86/kernel/process_32.c:190

c032a7e5: arch/x86/kernel/smpboot_32.c:426
c0320809: init/main.c:460

dmesg in case that is useful:

Linux version 2.6.25-rc1 ([email protected]) (gcc version 4.2.2 20071122
8 (prerelease) (4.2.2-2mdv2008.1)) #1 SMP Sat Feb 16 10:31:49 PST 2008
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fe8cc00 (usable)
BIOS-e820: 000000003fe8cc00 - 000000003fe8ec00 (ACPI NVS)
BIOS-e820: 000000003fe8ec00 - 000000003fe90c00 (ACPI data)
BIOS-e820: 000000003fe90c00 - 0000000040000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fec00000 - 00000000fed00400 (reserved)
BIOS-e820: 00000000fed20000 - 00000000feda0000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fef00000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
found SMP MP-table at [c00fe710] 000fe710
Entering add_active_range(0, 0, 261772) 0 entries of 256 used
Zone PFN ranges:
DMA 0 -> 4096
Normal 4096 -> 229376
HighMem 229376 -> 261772
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0 -> 261772
On node 0 totalpages: 261772
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 4064 pages, LIFO batch:0
Normal zone: 1760 pages used for memmap
Normal zone: 223520 pages, LIFO batch:31
HighMem zone: 253 pages used for memmap
HighMem zone: 32143 pages, LIFO batch:7
Movable zone: 0 pages used for memmap
DMI 2.3 present.
ACPI: RSDP 000FEBF0, 0014 (r0 DELL )
ACPI: RSDT 000FCC96, 003C (r1 DELL 8400 7 ASL 61)
ACPI: FACP 000FCCD2, 0074 (r1 DELL 8400 7 ASL 61)
ACPI: DSDT FFFC76A0, 2B21 (r1 DELL dt_ex 1000 MSFT 100000D)
ACPI: FACS 3FE8CC00, 0040
ACPI: SSDT FFFCA1C1, 00A7 (r1 DELL st_ex 1000 MSFT 100000D)
ACPI: APIC 000FCD46, 0092 (r1 DELL 8400 7 ASL 61)
ACPI: BOOT 000FCDD8, 0028 (r1 DELL 8400 7 ASL 61)
ACPI: MCFG 000FCE00, 003E (r1 DELL 8400 7 ASL 61)
ACPI: HPET 000FCE3E, 0038 (r1 DELL 8400 7 ASL 61)
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:3 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:3 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x07] disabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x03] disabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x05] disabled)
ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode: Flat. Using 1 I/O APICs
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 50000000 (gap: 40000000:a0000000)
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 259727
Kernel command line: BOOT_IMAGE=2.6.25-rc1 root=/dev/sda6 resume=/dev/sda7 vga=77
94 console=ttyS0,115200 console=tty0
mapped APIC to ffffb000 (fee00000)
mapped IOAPIC to ffffa000 (fec00000)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 3192.109 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
console [ttyS0] enabled
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1030584k/1047088k available (2237k kernel code, 15876k reserved, 779k daa
ta, 304k init, 129584k highmem)
virtual kernel memory layout:
fixmap : 0xffe14000 - 0xfffff000 (1964 kB)
pkmap : 0xff800000 - 0xffc00000 (4096 kB)
vmalloc : 0xf8800000 - 0xff7fe000 ( 111 MB)
lowmem : 0xc0000000 - 0xf8000000 ( 896 MB)
.init : 0xc03fb000 - 0xc0447000 ( 304 kB)
.data : 0xc032f5d9 - 0xc03f2440 ( 779 kB)
.text : 0xc0100000 - 0xc032f5d9 (2237 kB)
Checking if this processor honours the WP bit even in supervisor mode...Ok.
CPA: page pool initialized 16 of 16 pages preallocated
SLUB: Genslabs=11, HWalign=128, Order=0-1, MinObjects=4, CPUs=2, Nodes=1
hpet clockevent registered
Calibrating delay using timer specific routine.. 6388.88 BogoMIPS (lpj=3194441)
Security Framework initialized
Capability LSM initialized
Mount-cache hash table entries: 512
Initializing cgroup subsys ns
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
ACPI: Core revision 20070126
CPU0: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 04
Booting processor 1/1 ip 4000
Initializing CPU#1
Calibrating delay using timer specific routine.. 6384.03 BogoMIPS (lpj=3192016)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 04
Total of 2 processors activated (12772.91 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
net_namespace: 552 bytes
Booting paravirtualized kernel on bare hardware
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using MMCONFIG for extended config space
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: EC: Look up EC in DSDT
ACPI: BIOS _OSI(Linux) query ignored
ACPI: DMI System Vendor: Dell Inc.
ACPI: DMI Product Name: Dimension 8400
ACPI: DMI Product Version:
ACPI: DMI Board Name: 0U7077
ACPI: DMI BIOS Vendor: Dell Inc.
ACPI: DMI BIOS Date: 07/07/2006
ACPI: Please send DMI info above to [email protected]
ACPI: If "acpi_osi=Linux" works better, please notify [email protected]
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region 0880-08bf claimed by ICH6 GPIO
PCI: Transparent bridge - 0000:00:1e.0
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI2._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI3._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI4._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 12 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 *9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 *10 11 12 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 13 devices
ACPI: ACPI bus type pnp unregistered
PnPBIOS: Disabled by ACPI PNP
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 11
hpet0: 3 64-bit timers, 14318180 Hz
ACPI: RTC can wake from S4
Time: tsc clocksource has been installed.
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
system 00:01: ioport range 0x800-0x85f has been reserved
system 00:01: ioport range 0xc00-0xc7f has been reserved
system 00:01: ioport range 0x860-0x8ff could not be reserved
system 00:0a: iomem range 0x0-0x9ffff could not be reserved
system 00:0a: iomem range 0x100000-0xffffff could not be reserved
system 00:0a: iomem range 0x1000000-0x3fe8cbff could not be reserved
system 00:0a: iomem range 0xc0000-0xfffff could not be reserved
system 00:0a: iomem range 0xfec00000-0xfecfffff could not be reserved
system 00:0a: iomem range 0xfee00000-0xfeefffff could not be reserved
system 00:0a: iomem range 0xfed20000-0xfed9ffff could not be reserved
system 00:0a: iomem range 0xffb00000-0xffbfffff could not be reserved
system 00:0a: iomem range 0xffc00000-0xffffffff could not be reserved
system 00:0b: iomem range 0xe0000000-0xefffffff could not be reserved
system 00:0b: iomem range 0xfeda0000-0xfedacfff has been reserved
PCI: Bridge: 0000:00:01.0
IO window: d000-dfff
MEM window: 0xdfd00000-0xdfefffff
PREFETCH window: 0x00000000d0000000-0x00000000d7ffffff
PCI: Bridge: 0000:00:1c.0
IO window: disabled.
MEM window: 0xdfc00000-0xdfcfffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.1
IO window: disabled.
MEM window: 0xdfb00000-0xdfbfffff
PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
IO window: disabled.
MEM window: 0xdf400000-0xdfafffff
PREFETCH window: disabled.
ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:01.0 to 64
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1c.0 to 64
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:00:1c.1 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
checking if image is initramfs... it is
Freeing initrd memory: 2977k freed
Simple Boot Flag at 0x7a set to 0x1
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
apm: disabled - APM is not SMP safe.
audit: initializing netlink socket (disabled)
type=2000 audit(1203166128.274:1): initialized
highmem bounce pool size: 64 pages
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci 0000:01:00.0: Boot video device
PCI: Setting latency timer of device 0000:00:01.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:01.0:pcie00]
Allocate Port Service[0000:00:01.0:pcie03]
PCI: Setting latency timer of device 0000:00:1c.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.0:pcie00]
Allocate Port Service[0000:00:1c.0:pcie02]
Allocate Port Service[0000:00:1c.0:pcie03]
PCI: Setting latency timer of device 0000:00:1c.1 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[0000:00:1c.1:pcie00]
Allocate Port Service[0000:00:1c.1:pcie02]
Allocate Port Service[0000:00:1c.1:pcie03]
vesafb: framebuffer at 0xd0000000, mapped to 0xf8880000, using 5120k, total 16388
4k
vesafb: mode is 1280x1024x16, linelength=2560, pages=5
vesafb: protected mode interface info at c000:5890
vesafb: pmi: set display start = c00c5924, set palette = c00c5970
vesafb: pmi: ports = dc10 dc16 dc54 dc38 dc3c dc5c dc00 dc04 dcb0 dcb2 dcb4
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
Console: switching to colour frame buffer device 160x64
fb0: VESA VGA frame buffer device
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
hpet_resources: 0xfed00000 is busy
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
brd: module loaded
PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard as /class/input/input0
cpuidle: using governor ladder
cpuidle: using governor menu
TCP cubic registered
NET: Registered protocol family 1
Using IPI No-Shortcut mode
registered taskstats version 1
BIOS EDD facility v0.16 2004-Jun-25, 6 devices found
Freeing unused kernel memory: 304k freed
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 21 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: debug port 1
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
ehci_hcd 0000:00:1d.7: irq 21, io mem 0xffa80800
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 21 (level, low) -> IRQ 21
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 21, io base 0x0000ff80
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 22 (level, low) -> IRQ 22
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 22, io base 0x0000ff60
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
input: ImPS/2 Logitech Wheel Mouse as /class/input/input1
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 18
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 18, io base 0x0000ff40
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 23 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1d.3 to 64
uhci_hcd 0000:00:1d.3: UHCI Host Controller
uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.3: irq 23, io base 0x0000ff20
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
SCSI subsystem initialized
usb 2-1: new low speed USB device using uhci_hcd and address 2
Driver 'sd' needs updating - please use bus_type methods
libata version 3.00 loaded.
ahci 0000:00:1f.2: version 3.0
ACPI: PCI Interrupt 0000:00:1f.2[C] -> GSI 20 (level, low) -> IRQ 20
usb 2-1: configuration #1 chosen from 1 choice
usb 3-1: new full speed USB device using uhci_hcd and address 2
usb 3-1: configuration #1 chosen from 1 choice
usb 3-2: new full speed USB device using uhci_hcd and address 3
usb 3-2: configuration #1 chosen from 1 choice
ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl IDE mode
ahci 0000:00:1f.2: flags: 64bit ncq pm led slum part
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
ata1: SATA max UDMA/133 abar m1024@0xdffffc00 port 0xdffffd00 irq 20
ata2: SATA max UDMA/133 abar m1024@0xdffffc00 port 0xdffffd80 irq 20
ata3: SATA max UDMA/133 abar m1024@0xdffffc00 port 0xdffffe00 irq 20
ata4: SATA max UDMA/133 abar m1024@0xdffffc00 port 0xdffffe80 irq 20
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: HDS724040KLSA80, KFAOA20N, max UDMA/133
ata1.00: 781422768 sectors, multi 8: LBA48
ata1.00: configured for UDMA/133
ata2: SATA link down (SStatus 0 SControl 300)
ata3: SATA link down (SStatus 0 SControl 300)
ata4: SATA link down (SStatus 0 SControl 300)
scsi 0:0:0:0: Direct-Access ATA HDS724040KLSA80 KFAO PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPOO
or FUA
sd 0:0:0:0: [sda] 781422768 512-byte hardware sectors (400088 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPOO
or FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
sd 0:0:0:0: [sda] Attached SCSI disk
ata_piix 0000:00:1f.1: version 2.12
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:00:1f.1 to 64
scsi4 : ata_piix
scsi5 : ata_piix
ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
ata5.00: ATAPI: GCR-8483B, 1.09, max UDMA/33
ata5.01: ATAPI: PHILIPS DVD+/-RW DVD8631, GD30, max UDMA/33
ata5.00: configured for UDMA/33
ata5.01: configured for UDMA/33
ata6: port disabled. ignoring.
scsi 4:0:0:0: CD-ROM HL-DT-ST CD-ROM GCR-8483B 1.09 PQ: 0 ANSI: 5
scsi 4:0:1:0: CD-ROM PHILIPS DVD+-RW DVD8631 GD30 PQ: 0 ANSI: 5
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 4:0:0:0: Attached scsi generic sg1 type 5
scsi 4:0:1:0: Attached scsi generic sg2 type 5
dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.2)
tg3.c:v3.87 (December 20, 2007)
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:02:00.0 to 64
eth0: Tigon3 [partno(BCM95751) rev 4001 PHY(5750)] (PCI Express) 10/100/1000Basee
-T Ethernet 00:11:11:bd:d8:5d
eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] TSOcap[1]
eth0: dma_rwctrl[76180000] dma_mask[64-bit]
iTCO_vendor_support: vendor-support=0
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.02 (26-Jul-2007)
iTCO_wdt: Found a ICH6 or ICH6R TCO device (Version=2, TCOBASE=0x0860)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=1)
input: PC Speaker as /class/input/input2
Bluetooth: Core ver 2.11
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
ACPI: PCI Interrupt 0000:00:1f.3[B] -> GSI 17 (level, low) -> IRQ 17
input: Power Button (FF) as /class/input/input3
ACPI: Power Button (FF) [PWRF]
input: Power Button (CM) as /class/input/input4
ACPI: Power Button (CM) [VBTN]
ACPI: ACPI0007:00 is registered as cooling_device0
ACPI: ACPI0007:01 is registered as cooling_device1
ACPI: PCI Interrupt 0000:00:1e.2[A] -> GSI 23 (level, low) -> IRQ 23
PCI: Setting latency timer of device 0000:00:1e.2 to 64
Driver 'sr' needs updating - please use bus_type methods
sr0: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 4:0:0:0: Attached scsi CD-ROM sr0
sr1: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
sr 4:0:1:0: Attached scsi CD-ROM sr1
rtc_cmos 00:05: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day
parport_pc 00:08: reported by Plug and Play ACPI
parport0: PC-style at 0x378 (0x778), irq 7, using FIFO [PCSPP,TRISTATE,COMPAT,ECC
P]
AC'97 0 analog subsections not ready
intel8x0_measure_ac97_clock: measured 50243 usecs
intel8x0: clocking to 48000
Bluetooth: HCI USB driver ver 2.9
usbcore: registered new interface driver hci_usb
fuse init (API version 7.9)
floppy0: no floppy controllers found
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised: [email protected]
EXT3 FS on sda6, internal journal
Adding 1124508k swap on /dev/sda7. Priority:-1 extents:1 across:1124508k
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda10, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda9, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
loop: module loaded

2008-02-17 00:00:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Sat, 16 Feb 2008, Quel Qun wrote:
> > Please also enable CONFIG_DEBUG_LIST=y, which should catch the place
> > where a list corruption happens.
>
> Thank you for the hand holding. I must admit I do not anything
> about kernel debugging.
>
> With or without nohz=off, the crashes are very similar. It looks
> like it fails to execute list_add_tail(&timer->entry, vec), line 294
> of kernel/timer.c.

Yup, that's what I suspected. The list is corrupted:

> list_add corruption. prev->next should be next (c047e704), but was 00000000. (prev=ddcca118).

> list_add corruption. prev->next should be next (c047e77c), but was 00000001. (prev=f644bd98).

Unfortunately we only see that the list is corrupted but not which
code caused it. This looks like something forgot to delete the timer
before freeing the datastructure which contains it.

Can you please enable CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y
and give it another try?

If we can not catch it that way, I'll whip up a patch which points us
to the code which added the offending timer.

Thanks,

tglx

2008-02-18 00:01:35

by kelk1

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

-------------- Original message ----------------------
From: Thomas Gleixner <[email protected]>
> On Sat, 16 Feb 2008, Quel Qun wrote:
> Unfortunately we only see that the list is corrupted but not which
> code caused it. This looks like something forgot to delete the timer
> before freeing the datastructure which contains it.
>
> Can you please enable CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y
> and give it another try?
>
> If we can not catch it that way, I'll whip up a patch which points us
> to the code which added the offending timer.
>
Hi,

Note: I switched to 2.6.25-rc2. The only new thing I see is this message:

hci_cmd_task: hci0 command tx timeout

This comes from net/bluetooth/hci_core.c, line 1547

There is indeed a timeout message in the log (at the end of this email). I tried to boot
with slub_debug but did not get anything more. slabinfo -v does not report anything either.

Crash log:

hci_cmd_task: hci0 command tx timeout
BUG: unable to handle kernel paging request at 6b6b6b6b
IP: [<c012d1af>] get_next_timer_interrupt+0xf6/0x1fc
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap nfsd exportfs nfs lockd nfs_acl sunrpc af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat piix ide_core fuse snd_pcm_oss snd_mixer_oss hci_usb snd_intel8x0 snd_ac97_codec ac97_bus bluetooth parport_pc i2c_i801 parport i2c_core sr_mod snd_pcm pcspkr rtc_cmos snd_timer snd iTCO_wdt soundcore iTCO_vendor_support snd_page_alloc thermal button processor tg3 evdev dcdbas sg ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.25-rc2kk1 #1)
EIP: 0060:[<c012d1af>] EFLAGS: 00010016 CPU: 0
EIP is at get_next_timer_interrupt+0xf6/0x1fc
EAX: 6b6b6b6b EBX: 00084e8e ECX: c043067c EDX: 6b6b6b6b
ESI: 0000000e EDI: c043060c EBP: c03afee8 ESP: c03afeb0
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c03ae000 task=c03803a0 task.ti=c03ae000)
Stack: 00084e00 00084d00 c042fe00 00000000 00000001 0000000e 0000084e c043060c
c043080c c0430a0c c0430c0c c18090c0 8251f800 00084d00 c03aff2c c013f9a8
00000001 c03a2b08 00000046 c03aff20 825216b1 000000c4 8251f800 000000c4
Call Trace:
[<c013f9a8>] ? tick_nohz_stop_sched_tick+0x130/0x337
[<c01299d4>] ? irq_exit+0x55/0x6e
[<c0114717>] ? smp_apic_timer_interrupt+0x59/0x92
[<c010582c>] ? apic_timer_interrupt+0x28/0x30
[<c013007b>] ? send_group_sigqueue+0xdc/0x10e
[<c011879f>] ? native_safe_halt+0x5/0x7
[<c0103965>] ? default_idle+0x4d/0x7f
[<c0103918>] ? default_idle+0x0/0x7f
[<c01037c4>] ? cpu_idle+0x6f/0x100
[<c02e29b9>] ? rest_init+0x49/0x50
=======================
Code: 8d e0 8b 45 e0 83 e0 3f 89 45 dc 89 c6 8b 04 f7 8b 10 0f 18 02 90 8d 0c f7 39 c8 0f 84 82 00 00 00 8b 40 08 39 d8 0f 48 d8 89 d0 <8b> 12 0f 18 02 90 39 c1 75 ec c7 45 d4 01 00 00 00 8b 7d dc 85
EIP: [<c012d1af>] get_next_timer_interrupt+0xf6/0x1fc SS:ESP 0068:c03afeb0
---[ end trace 04af5dc7a2225613 ]---
Kernel panic - not syncing: Attempted to kill the idle task!

Here is the log when I get something before it crashes. No difference with when it works under 2.6.23.1:

kernel: Bluetooth: L2CAP socket layer initialized
bluetooth: Starting bluetooth service
hcid[5014]: Bluetooth HCI daemon
hcid[5014]: HCI dev 0 registered
hcid[5014]: HCI dev 0 already up
hcid[5014]: Device hci0 has been added
hcid[5014]: Starting security manager 0
kernel: Bluetooth: RFCOMM socket layer initialized
kernel: Bluetooth: RFCOMM TTY layer initialized
kernel: Bluetooth: RFCOMM ver 1.8
bluetooth: Starting hidd
kernel: Bluetooth: HIDP (Human Interface Emulation) ver 1.2
hidd[5056]: Bluetooth HID daemon
hcid[5014]: Can't read version info for hci0: Connection timed out (110)
hcid[5014]: Starting SDP server
hcid[5014]: Created local server at unix:abstract=/var/run/dbus-oaCBrCPPW2,guid=200dae308c72ef5021d6344847b8c631

Sorry for the meager yield.
--
kk1

2008-02-18 12:44:19

by Sebastian Siewior

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

* Quel Qun | 2008-02-18 00:01:21 [+0000]:

Please send me your .config file and process list (ps uax > ps_list)
after the crash. I have a dongle with the same usb id as yours and I
can't reproduce the crash. So it is either some .config magic or one of
your programs is accessing the dongle.

Sebastian

2008-02-18 13:13:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Mon, 18 Feb 2008, Quel Qun wrote:

Added bluetooth wizards to CC

> > Can you please enable CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y
> > and give it another try?
> >
> > If we can not catch it that way, I'll whip up a patch which points us
> > to the code which added the offending timer.
> >
> Hi,
>
> Note: I switched to 2.6.25-rc2. The only new thing I see is this message:
>
> hci_cmd_task: hci0 command tx timeout
>
> This comes from net/bluetooth/hci_core.c, line 1547
>
> There is indeed a timeout message in the log (at the end of this
> email). I tried to boot with slub_debug but did not get anything
> more. slabinfo -v does not report anything either.
>
> Crash log:
>
> hci_cmd_task: hci0 command tx timeout
> BUG: unable to handle kernel paging request at 6b6b6b6b

We got some more info ---------------------------^^^^^^^^
#define POISON_FREE 0x6b /* for use-after-free poisoning */

So the timer is in an allocated data structure, which is
freed without having removed the timer first.

> Sorry for the meager yield.

Hey, we know already more :)

Marcel, any idea on this one ?

Thanks,
tglx

2008-02-19 21:07:28

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

Hi Thomas,

>>> Can you please enable CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y
>>> and give it another try?
>>>
>>> If we can not catch it that way, I'll whip up a patch which points
>>> us
>>> to the code which added the offending timer.
>>>
>> Hi,
>>
>> Note: I switched to 2.6.25-rc2. The only new thing I see is this
>> message:
>>
>> hci_cmd_task: hci0 command tx timeout
>>
>> This comes from net/bluetooth/hci_core.c, line 1547
>>
>> There is indeed a timeout message in the log (at the end of this
>> email). I tried to boot with slub_debug but did not get anything
>> more. slabinfo -v does not report anything either.
>>
>> Crash log:
>>
>> hci_cmd_task: hci0 command tx timeout
>> BUG: unable to handle kernel paging request at 6b6b6b6b
>
> We got some more info ---------------------------^^^^^^^^
> #define POISON_FREE 0x6b /* for use-after-free poisoning */
>
> So the timer is in an allocated data structure, which is
> freed without having removed the timer first.
>
>> Sorry for the meager yield.
>
> Hey, we know already more :)
>
> Marcel, any idea on this one ?

I don't really have any idea. Nothing has been changed in this area
for a couple of years. The command TX timeout is the timeout that
indicates a missing answer to a command sent down to the Bluetooth chip.

However this involves some atomic and tasklet stuff. Did we have some
changes that I missed and might now render this usage as broken.

Regards

Marcel

2008-02-20 00:16:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Tue, 19 Feb 2008, Marcel Holtmann wrote:
> > > hci_cmd_task: hci0 command tx timeout
> > > BUG: unable to handle kernel paging request at 6b6b6b6b
> >
> > We got some more info ---------------------------^^^^^^^^
> > #define POISON_FREE 0x6b /* for use-after-free poisoning */
> >
> > So the timer is in an allocated data structure, which is
> > freed without having removed the timer first.
> >
> > > Sorry for the meager yield.
> >
> > Hey, we know already more :)
> >
> > Marcel, any idea on this one ?
>
> I don't really have any idea. Nothing has been changed in this area for a
> couple of years. The command TX timeout is the timeout that indicates a
> missing answer to a command sent down to the Bluetooth chip.
>
> However this involves some atomic and tasklet stuff. Did we have some changes
> that I missed and might now render this usage as broken.

Not that I'm aware off, but this might as well be some old use after
free bug which got exposed by some unrelated change. The good news is
that it is reproducible. I'll hack up some nasty debug patch which
lets us - hopefully - decode where the timer was armed.

Thanks,

tglx

2008-02-20 08:11:47

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Wed, 20 Feb 2008, Thomas Gleixner wrote:
> On Tue, 19 Feb 2008, Marcel Holtmann wrote:
> > I don't really have any idea. Nothing has been changed in this area for a
> > couple of years. The command TX timeout is the timeout that indicates a
> > missing answer to a command sent down to the Bluetooth chip.
> >
> > However this involves some atomic and tasklet stuff. Did we have some changes
> > that I missed and might now render this usage as broken.
>
> Not that I'm aware off, but this might as well be some old use after
> free bug which got exposed by some unrelated change. The good news is
> that it is reproducible. I'll hack up some nasty debug patch which
> lets us - hopefully - decode where the timer was armed.

Quel, before I do that, is there any chance that you retest with the
latest mainline git version ?

http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.25-rc2-git4.bz2

Is the delta which applies on top of rc2.

Thanks,

tglx

2008-02-21 07:31:17

by Dave Young

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Wed, Feb 20, 2008 at 4:11 PM, Thomas Gleixner <[email protected]> wrote:
> On Wed, 20 Feb 2008, Thomas Gleixner wrote:
> > On Tue, 19 Feb 2008, Marcel Holtmann wrote:
>
> > > I don't really have any idea. Nothing has been changed in this area for a
> > > couple of years. The command TX timeout is the timeout that indicates a
> > > missing answer to a command sent down to the Bluetooth chip.
> > >
> > > However this involves some atomic and tasklet stuff. Did we have some changes
> > > that I missed and might now render this usage as broken.
> >
> > Not that I'm aware off, but this might as well be some old use after
> > free bug which got exposed by some unrelated change. The good news is
> > that it is reproducible. I'll hack up some nasty debug patch which
> > lets us - hopefully - decode where the timer was armed.
>
> Quel, before I do that, is there any chance that you retest with the
> latest mainline git version ?
>
> http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.25-rc2-git4.bz2

And please test with this patch as well:

http://lkml.org/lkml/2008/2/20/121

>
> Is the delta which applies on top of rc2.
>
>
>
> Thanks,
>
> tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-02-21 16:49:49

by kelk1

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle


-------------- Original message ----------------------
From: "Dave Young" <[email protected]>
> On Wed, Feb 20, 2008 at 4:11 PM, Thomas Gleixner <[email protected]> wrote:
> > On Wed, 20 Feb 2008, Thomas Gleixner wrote:
> > > On Tue, 19 Feb 2008, Marcel Holtmann wrote:
> >
> > > > I don't really have any idea. Nothing has been changed in this area for a
> > > > couple of years. The command TX timeout is the timeout that indicates a
> > > > missing answer to a command sent down to the Bluetooth chip.
> > > >
> > > > However this involves some atomic and tasklet stuff. Did we have some
> changes
> > > > that I missed and might now render this usage as broken.
> > >
> > > Not that I'm aware off, but this might as well be some old use after
> > > free bug which got exposed by some unrelated change. The good news is
> > > that it is reproducible. I'll hack up some nasty debug patch which
> > > lets us - hopefully - decode where the timer was armed.
> >
> > Quel, before I do that, is there any chance that you retest with the
> > latest mainline git version ?
> >
> >
> http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.25-rc2-git4.bz2
>
> And please test with this patch as well:
>
> http://lkml.org/lkml/2008/2/20/121
>
Same kind of result unfortunately with this last patch on top of git4:

hci_cmd_task: hci0 command tx timeout
BUG: unable to handle kernel paging request at 6b6b6b6b
IP: [<c012d22f>] get_next_timer_interrupt+0xf6/0x1fc
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap nfsd exportfs nfs lockd nfs_acl sunrpc autofs4 af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 hci_usb snd_ac97_codec ac97_bus snd_pcm snd_timer i2c_i801 bluetooth parport_pc sr_mod snd parport i2c_core soundcore rtc_cmos pcspkr iTCO_wdt snd_page_alloc iTCO_vendor_support thermal processor button dcdbas evdev tg3 sg ide_disk piix ide_core ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.25-rc2-git4kk1 #1)
EIP: 0060:[<c012d22f>] EFLAGS: 00010002 CPU: 0
EIP is at get_next_timer_interrupt+0xf6/0x1fc
EAX: 6b6b6b6b EBX: 3fffa098 ECX: c0430714 EDX: 6b6b6b6b
ESI: 00000021 EDI: c043060c EBP: c03aff58 ESP: c03aff20
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c03ae000 task=c03803a0 task.ti=c03ae000)
Stack: ffffa100 ffffa098 c042fe00 00000000 00000001 00000021 00ffffa1 c043060c
c043080c c0430a0c c0430c0c c18090c0 299c0e00 ffffa098 c03aff9c c013fa28
29ab5040 c03803a0 c0380510 c180c200 299c44e8 00000040 299c0e00 00000040
Call Trace:
[<c013fa28>] ? tick_nohz_stop_sched_tick+0x130/0x337
[<c013fd2b>] ? tick_nohz_restart_sched_tick+0xfc/0x139
[<c0103918>] ? default_idle+0x0/0x7f
[<c0103789>] ? cpu_idle+0x34/0x100
[<c02e2d39>] ? rest_init+0x49/0x50
=======================
Code: 8d e0 8b 45 e0 83 e0 3f 89 45 dc 89 c6 8b 04 f7 8b 10 0f 18 02 90 8d 0c f7 39 c8 0f 84 82 00 00 00 8b 40 08 39 d8 0f 48 d8 89 d0 <8b> 12 0f 18 02 90 39 c1 75 ec c7 45 d4 01 00 00 00 8b 7d dc 85
EIP: [<c012d22f>] get_next_timer_interrupt+0xf6/0x1fc SS:ESP 0068:c03aff20
---[ end trace bb6b2d4df944b938 ]---
Kernel panic - not syncing: Attempted to kill the idle task!

# addr2line -e vmlinux c012d22f
/usr/src/linux-2.6.25-rc2kk1/kernel/timer.c:721

721: list_for_each_entry(nte, varp->vec + slot, entry) {
722: found = 1;
723: if (time_before(nte->expires, expires))
724: expires = nte->expires;
725: }

--
Eric

2008-02-21 19:38:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Thu, 21 Feb 2008, Quel Qun wrote:
> > > > Not that I'm aware off, but this might as well be some old use after
> > > > free bug which got exposed by some unrelated change. The good news is
> > > > that it is reproducible. I'll hack up some nasty debug patch which
> > > > lets us - hopefully - decode where the timer was armed.
> > >
> > > Quel, before I do that, is there any chance that you retest with the
> > > latest mainline git version ?
> > >
> > >
> > http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.25-rc2-git4.bz2
> >
> > And please test with this patch as well:
> >
> > http://lkml.org/lkml/2008/2/20/121
> >
> Same kind of result unfortunately with this last patch on top of git4:

At least it is fully reproducible. Please apply the patch below to
your git4 tree and do not change your .config. The output should show,
which code armed the timer.

Thanks,

tglx

Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -58,6 +58,40 @@ EXPORT_SYMBOL(jiffies_64);
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)

+struct timer_trace {
+ struct timer_list *timer;
+ void *fn;
+ void *addr;
+};
+
+#define TTRACE_SIZE 4096
+static struct timer_trace ttrace[TTRACE_SIZE];
+static int ttrace_idx;
+static DEFINE_SPINLOCK(ttrace_lock);
+
+void ttrace_find_timer(struct timer_list *timer)
+{
+ char symname[KSYM_NAME_LEN];
+
+ unsigned long flags;
+ int i;
+
+ spin_lock_irqsave(&ttrace_lock, flags);
+ for (i = 0; i < TTRACE_SIZE; i++) {
+ if (ttrace[i].timer == timer) {
+ printk(KERN_ERR "TTRACE timer %p fn %p addr %p\n",
+ ttrace[i].timer, ttrace[i].fn, ttrace[i].addr);
+ if (lookup_symbol_name((unsigned long) ttrace[i].fn,
+ symname) == 0)
+ printk(KERN_ERR "TTRACE fn %s\n", symname);
+ if (lookup_symbol_name((unsigned long) ttrace[i].addr,
+ symname) == 0)
+ printk(KERN_ERR "TTRACE addr %s\n", symname);
+ }
+ }
+ spin_unlock_irqrestore(&ttrace_lock, flags);
+}
+
struct tvec {
struct list_head vec[TVN_SIZE];
};
@@ -395,6 +429,13 @@ int __mod_timer(struct timer_list *timer
unsigned long flags;
int ret = 0;

+ spin_lock_irqsave(&ttrace_lock, flags);
+ ttrace[ttrace_idx].timer = timer;
+ ttrace[ttrace_idx].fn = timer->function;
+ ttrace[ttrace_idx].addr = __builtin_return_address(0);
+ ttrace_idx = (ttrace_idx + 1) & (TTRACE_SIZE - 1);
+ spin_unlock_irqrestore(&ttrace_lock, flags);
+
timer_stats_timer_set_start_info(timer);
BUG_ON(!timer->function);

@@ -687,6 +728,14 @@ static unsigned long __next_timer_interr
/* Look for timer events in tv1. */
index = slot = timer_jiffies & TVR_MASK;
do {
+ struct list_head *tmp;
+
+ __list_for_each(tmp, base->tv1.vec + slot) {
+ nte = (struct timer_list *) tmp;
+ if (nte->entry.next == (void *)0x6b6b6b6b)
+ ttrace_find_timer(nte);
+ }
+
list_for_each_entry(nte, base->tv1.vec + slot, entry) {
if (tbase_get_deferrable(nte->base))
continue;

2008-02-22 02:40:59

by kelk1

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle


-------------- Original message ----------------------
From: Thomas Gleixner <[email protected]>
> On Thu, 21 Feb 2008, Quel Qun wrote:
> > > > > Not that I'm aware off, but this might as well be some old use after
> > > > > free bug which got exposed by some unrelated change. The good news is
> > > > > that it is reproducible. I'll hack up some nasty debug patch which
> > > > > lets us - hopefully - decode where the timer was armed.
> > > >
> > > > Quel, before I do that, is there any chance that you retest with the
> > > > latest mainline git version ?
> > > >
> > > >
> > >
> http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.25-rc2-git4.bz2
> > >
> > > And please test with this patch as well:
> > >
> > > http://lkml.org/lkml/2008/2/20/121
> > >
> > Same kind of result unfortunately with this last patch on top of git4:
>
> At least it is fully reproducible. Please apply the patch below to
> your git4 tree and do not change your .config. The output should show,
> which code armed the timer.
>
Thomas,

Thanks for the patch, but that did not work, I never got the trace.

I switched to git5 and applied the patch.

First crash (= attached kernlog.9) showed some hald process, so I decided to reduce the number of services and processes to a maximum. Attached are process list before starting sdptool browse and crashing, list of modules and list of services.

Second crash:

BUG: unable to handle kernel paging request at 6b6b6b6b
IP: [<c012d51d>] get_next_timer_interrupt+0x11f/0x234
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap nfsd exportfs nfs lockd nfs_acl sunrpc autofs4 af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec hci_usb ac97_bus snd_pcm parport_pc snd_timer snd sr_mod i2c_i801 rtc_cmos iTCO_wdt i2c_core parport soundcore iTCO_vendor_support pcspkr snd_page_alloc bluetooth button thermal processor evdev dcdbas tg3 sg ide_disk piix ide_core ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.25-rc2-git5kk1 #1)
EIP: 0060:[<c012d51d>] EFLAGS: 00010002 CPU: 0
EIP is at get_next_timer_interrupt+0x11f/0x234
EAX: 6b6b6b6b EBX: 3ffda6f6 ECX: c0432744 EDX: 6b6b6b6b
ESI: 00000027 EDI: c043260c EBP: c03b1ee8 ESP: c03b1eac
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c03b0000 task=c03813a0 task.ti=c03b0000)
Stack: fffda6f6 c0431e00 00000000 fffda700 00000001 000000f7 00000027 00fffda7
c043260c c043280c c0432a0c c0432c0c c18090c0 0643e180 fffda6f6 c03b1f2c
c013fb78 00000001 c03a3b08 00000046 c03b1f20 0644b3c1 00000022 0643e180
Call Trace:
[<c013fb78>] ? tick_nohz_stop_sched_tick+0x130/0x337
[<c0129a54>] ? irq_exit+0x55/0x6e
[<c01146e7>] ? smp_apic_timer_interrupt+0x59/0x92
[<c010582c>] ? apic_timer_interrupt+0x28/0x30
[<c013007b>] ? get_signal_to_deliver+0x2d8/0x332
[<c011879f>] ? native_safe_halt+0x5/0x7
[<c0103965>] ? default_idle+0x4d/0x7f
[<c0103918>] ? default_idle+0x0/0x7f
[<c01037c4>] ? cpu_idle+0x6f/0x100
[<c02e2e89>] ? rest_init+0x49/0x50
=======================
Code: 85 e0 8b 4d e0 83 e1 3f 89 4d dc 89 ce 8b 04 f7 8b 10 0f 18 02 90 8d 0c f7 39 c8 0f 84 8d 00 00 00 8b 40 08 39 d8 0f 48 d8 89 d0 <8b> 12 0f 18 02 90 39 c8 75 ec c7 45 cc 01 00 00 00 8b 7d dc 85
EIP: [<c012d51d>] get_next_timer_interrupt+0x11f/0x234

$ addr2line -e vmlinux c012d51d
/usr/src/linux-2.6.25-rc2-git5kk1/kernel/timer.c:770

Crap, that is on the next list_for_each_entry in timer.c :(

I tried to make a similar test loop as you did a few lines above:

@@ -718,6 +767,14 @@

index = slot = timer_jiffies & TVN_MASK;
do {
+ struct list_head *tmp;
+
+ __list_for_each(tmp, varp->vec + slot) {
+ nte = (struct timer_list *) tmp;
+ if (nte->entry.next == (void *)0x6b6b6b6b)
+ ttrace_find_timer(nte);
+ }
+
list_for_each_entry(nte, varp->vec + slot, entry) {
found = 1;
if (time_before(nte->expires, expires))

I thought I got it on the next crash, but the system locked too fast, and the only thing I saw was:

TTRACE timer f7b52858 fn f8e7c608 addr c012d776
TTRACE fn l2cap_info_timeout
TTRACE addr mod_timer
BUG: unable to handle kernel paging request at 6b6b6b6b
IP:

$ addr2line -e vmlinux.kk1 c012d776
/usr/src/linux-2.6.25-rc2-git5kk1/kernel/timer.c:533

int mod_timer(struct timer_list *timer, unsigned long expires)
{
BUG_ON(!timer->function);

timer_stats_timer_set_start_info(timer);
/*
* This is a common optimization triggered by the
* networking code - if the timer is re-modified
* to be the same thing then just return:
*/
if (timer->expires == expires && timer_pending(timer))
return 1;

return __mod_timer(timer, expires);
} <<<< line 533 is here

Unfortunately, I never got anything more. After that, the only thing I got, even without my changes, was:

list_add corruption. prev->next should be next (c0432764), but was 6b6b6b6b. (prev=f6d6e908).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [#1] SMP
Modules linked in: hidp rfcomm l2cap binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 sr_mod snd_ac97_codec ac97_bus snd_pcm parport_pc snd_timer parport snd soundcore i2c_i801 i2c_core hci_usb snd_page_alloc bluetooth rtc_cmos pcspkr iTCO_wdt iTCO_vendor_support tg3 processor button evdev dcdbas sg ide_disk piix ide_core ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]

Pid: 8, comm: events/0 Not tainted (2.6.25-rc2-git5kk1 #3)
EIP: 0060:[<c01f5703>] EFLAGS: 00010086 CPU: 0
EIP is at __list_add+0x5a/0x5e
EAX: 00000061 EBX: c18093d0 ECX: 00000001 EDX: 00000096
ESI: c18093d0 EDI: c0431e00 EBP: f788defc ESP: f788dee8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process events/0 (pid: 8, ti=f788c000 task=f785cc60 task.ti=f788c000)
Stack: c035609c c0432764 6b6b6b6b f6d6e908 fffd277d f788df0c c012cba2 c18093d0
c0431e00 f788df2c c012d2cc fffd2b64 00000000 00000286 c18093c0 c18093d0
f7811690 f788df44 c0133ee6 ffffffff c18093c0 000003e8 f7811690 f788df5c
Call Trace:
[<c012cba2>] ? internal_add_timer+0x53/0xb4
[<c012d2cc>] ? __mod_timer+0xe9/0x102
[<c0133ee6>] ? queue_delayed_work_on+0x84/0xb7
[<c0133f71>] ? queue_delayed_work+0x40/0x48
[<c0169025>] ? vmstat_update+0x0/0x28
[<c0133f9b>] ? schedule_delayed_work+0x22/0x26
[<c016904b>] ? vmstat_update+0x26/0x28
[<c01337d4>] ? run_workqueue+0xc1/0x150
[<c0134022>] ? worker_thread+0x83/0xd9
[<c013676e>] ? autoremove_wake_function+0x0/0x38
[<c0133f9f>] ? worker_thread+0x0/0xd9
[<c01364c4>] ? kthread+0x37/0x59
[<c013648d>] ? kthread+0x0/0x59
[<c01059bb>] ? kernel_thread_helper+0x7/0x1c
=======================
Code: 54 24 04 c7 04 24 4c 60 35 c0 e8 f1 fe f2 ff 0f 0b eb fe 89 44 24 0c 89 54 24 08 89 4c 24 04 c7 04 24 9c 60 35 c0 e8 d5 fe f2 ff <0f> 0b eb fe 55 89 e5 8b 0a e8 98 ff ff ff 5d c3 90 55 89 e5 53
EIP: [<c01f5703>] __list_add+0x5a/0x5e SS:ESP 0068:f788dee8
---[ end trace bd4e31c9ceb47c4f ]---

I hope the tiny bit of trace can trigger some idea. At least l2cap has something to do with bluetooth. l2cap_info_timeout is line 360 of net/bluetooth/l2cap.c, apparently only called from l2cap_conn_add, line 391: setup_timer(&conn->info_timer, l2cap_info_timeout, (unsigned long)conn);

After four hours and ten crashes today, it is the little I got. Kernel stuff is tough...
--
kk1


Attachments:
mod_list (1.39 kB)
ps_list (4.08 kB)
serv_list (327.00 B)
kernlog.9 (3.39 kB)
Download all attachments

2008-02-22 03:18:35

by Dave Young

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Fri, Feb 22, 2008 at 02:40:41AM +0000, Quel Qun wrote:
>
> -------------- Original message ----------------------
> From: Thomas Gleixner <[email protected]>
> > On Thu, 21 Feb 2008, Quel Qun wrote:
> > > > > > Not that I'm aware off, but this might as well be some old use after
> > > > > > free bug which got exposed by some unrelated change. The good news is
> > > > > > that it is reproducible. I'll hack up some nasty debug patch which
> > > > > > lets us - hopefully - decode where the timer was armed.
> > > > >
> > > > > Quel, before I do that, is there any chance that you retest with the
> > > > > latest mainline git version ?
> > > > >
> > > > >
> > > >
> > http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.25-rc2-git4.bz2
> > > >
> > > > And please test with this patch as well:
> > > >
> > > > http://lkml.org/lkml/2008/2/20/121
> > > >
> > > Same kind of result unfortunately with this last patch on top of git4:
> >
> > At least it is fully reproducible. Please apply the patch below to
> > your git4 tree and do not change your .config. The output should show,
> > which code armed the timer.
> >
> Thomas,
>
> Thanks for the patch, but that did not work, I never got the trace.
>
> I switched to git5 and applied the patch.
>
> First crash (= attached kernlog.9) showed some hald process, so I decided to reduce the number of services and processes to a maximum. Attached are process list before starting sdptool browse and crashing, list of modules and list of services.
>
> Second crash:
>
> BUG: unable to handle kernel paging request at 6b6b6b6b
> IP: [<c012d51d>] get_next_timer_interrupt+0x11f/0x234
> *pde = 00000000
> Oops: 0000 [#1] SMP
> Modules linked in: hidp rfcomm l2cap nfsd exportfs nfs lockd nfs_acl sunrpc autofs4 af_packet binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec hci_usb ac97_bus snd_pcm parport_pc snd_timer snd sr_mod i2c_i801 rtc_cmos iTCO_wdt i2c_core parport soundcore iTCO_vendor_support pcspkr snd_page_alloc bluetooth button thermal processor evdev dcdbas tg3 sg ide_disk piix ide_core ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]
>
> Pid: 0, comm: swapper Not tainted (2.6.25-rc2-git5kk1 #1)
> EIP: 0060:[<c012d51d>] EFLAGS: 00010002 CPU: 0
> EIP is at get_next_timer_interrupt+0x11f/0x234
> EAX: 6b6b6b6b EBX: 3ffda6f6 ECX: c0432744 EDX: 6b6b6b6b
> ESI: 00000027 EDI: c043260c EBP: c03b1ee8 ESP: c03b1eac
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process swapper (pid: 0, ti=c03b0000 task=c03813a0 task.ti=c03b0000)
> Stack: fffda6f6 c0431e00 00000000 fffda700 00000001 000000f7 00000027 00fffda7
> c043260c c043280c c0432a0c c0432c0c c18090c0 0643e180 fffda6f6 c03b1f2c
> c013fb78 00000001 c03a3b08 00000046 c03b1f20 0644b3c1 00000022 0643e180
> Call Trace:
> [<c013fb78>] ? tick_nohz_stop_sched_tick+0x130/0x337
> [<c0129a54>] ? irq_exit+0x55/0x6e
> [<c01146e7>] ? smp_apic_timer_interrupt+0x59/0x92
> [<c010582c>] ? apic_timer_interrupt+0x28/0x30
> [<c013007b>] ? get_signal_to_deliver+0x2d8/0x332
> [<c011879f>] ? native_safe_halt+0x5/0x7
> [<c0103965>] ? default_idle+0x4d/0x7f
> [<c0103918>] ? default_idle+0x0/0x7f
> [<c01037c4>] ? cpu_idle+0x6f/0x100
> [<c02e2e89>] ? rest_init+0x49/0x50
> =======================
> Code: 85 e0 8b 4d e0 83 e1 3f 89 4d dc 89 ce 8b 04 f7 8b 10 0f 18 02 90 8d 0c f7 39 c8 0f 84 8d 00 00 00 8b 40 08 39 d8 0f 48 d8 89 d0 <8b> 12 0f 18 02 90 39 c8 75 ec c7 45 cc 01 00 00 00 8b 7d dc 85
> EIP: [<c012d51d>] get_next_timer_interrupt+0x11f/0x234
>
> $ addr2line -e vmlinux c012d51d
> /usr/src/linux-2.6.25-rc2-git5kk1/kernel/timer.c:770
>
> Crap, that is on the next list_for_each_entry in timer.c :(
>
> I tried to make a similar test loop as you did a few lines above:
>
> @@ -718,6 +767,14 @@
>
> index = slot = timer_jiffies & TVN_MASK;
> do {
> + struct list_head *tmp;
> +
> + __list_for_each(tmp, varp->vec + slot) {
> + nte = (struct timer_list *) tmp;
> + if (nte->entry.next == (void *)0x6b6b6b6b)
> + ttrace_find_timer(nte);
> + }
> +
> list_for_each_entry(nte, varp->vec + slot, entry) {
> found = 1;
> if (time_before(nte->expires, expires))
>
> I thought I got it on the next crash, but the system locked too fast, and the only thing I saw was:
>
> TTRACE timer f7b52858 fn f8e7c608 addr c012d776
> TTRACE fn l2cap_info_timeout
> TTRACE addr mod_timer
> BUG: unable to handle kernel paging request at 6b6b6b6b
> IP:
>
> $ addr2line -e vmlinux.kk1 c012d776
> /usr/src/linux-2.6.25-rc2-git5kk1/kernel/timer.c:533
>
> int mod_timer(struct timer_list *timer, unsigned long expires)
> {
> BUG_ON(!timer->function);
>
> timer_stats_timer_set_start_info(timer);
> /*
> * This is a common optimization triggered by the
> * networking code - if the timer is re-modified
> * to be the same thing then just return:
> */
> if (timer->expires == expires && timer_pending(timer))
> return 1;
>
> return __mod_timer(timer, expires);
> } <<<< line 533 is here
>
> Unfortunately, I never got anything more. After that, the only thing I got, even without my changes, was:
>
> list_add corruption. prev->next should be next (c0432764), but was 6b6b6b6b. (prev=f6d6e908).
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:33!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: hidp rfcomm l2cap binfmt_misc loop nls_iso8859_1 nls_cp437 vfat fat fuse snd_pcm_oss snd_mixer_oss snd_intel8x0 sr_mod snd_ac97_codec ac97_bus snd_pcm parport_pc snd_timer parport snd soundcore i2c_i801 i2c_core hci_usb snd_page_alloc bluetooth rtc_cmos pcspkr iTCO_wdt iTCO_vendor_support tg3 processor button evdev dcdbas sg ide_disk piix ide_core ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: scsi_wait_scan]
>
> Pid: 8, comm: events/0 Not tainted (2.6.25-rc2-git5kk1 #3)
> EIP: 0060:[<c01f5703>] EFLAGS: 00010086 CPU: 0
> EIP is at __list_add+0x5a/0x5e
> EAX: 00000061 EBX: c18093d0 ECX: 00000001 EDX: 00000096
> ESI: c18093d0 EDI: c0431e00 EBP: f788defc ESP: f788dee8
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process events/0 (pid: 8, ti=f788c000 task=f785cc60 task.ti=f788c000)
> Stack: c035609c c0432764 6b6b6b6b f6d6e908 fffd277d f788df0c c012cba2 c18093d0
> c0431e00 f788df2c c012d2cc fffd2b64 00000000 00000286 c18093c0 c18093d0
> f7811690 f788df44 c0133ee6 ffffffff c18093c0 000003e8 f7811690 f788df5c
> Call Trace:
> [<c012cba2>] ? internal_add_timer+0x53/0xb4
> [<c012d2cc>] ? __mod_timer+0xe9/0x102
> [<c0133ee6>] ? queue_delayed_work_on+0x84/0xb7
> [<c0133f71>] ? queue_delayed_work+0x40/0x48
> [<c0169025>] ? vmstat_update+0x0/0x28
> [<c0133f9b>] ? schedule_delayed_work+0x22/0x26
> [<c016904b>] ? vmstat_update+0x26/0x28
> [<c01337d4>] ? run_workqueue+0xc1/0x150
> [<c0134022>] ? worker_thread+0x83/0xd9
> [<c013676e>] ? autoremove_wake_function+0x0/0x38
> [<c0133f9f>] ? worker_thread+0x0/0xd9
> [<c01364c4>] ? kthread+0x37/0x59
> [<c013648d>] ? kthread+0x0/0x59
> [<c01059bb>] ? kernel_thread_helper+0x7/0x1c
> =======================
> Code: 54 24 04 c7 04 24 4c 60 35 c0 e8 f1 fe f2 ff 0f 0b eb fe 89 44 24 0c 89 54 24 08 89 4c 24 04 c7 04 24 9c 60 35 c0 e8 d5 fe f2 ff <0f> 0b eb fe 55 89 e5 8b 0a e8 98 ff ff ff 5d c3 90 55 89 e5 53
> EIP: [<c01f5703>] __list_add+0x5a/0x5e SS:ESP 0068:f788dee8
> ---[ end trace bd4e31c9ceb47c4f ]---
>
> I hope the tiny bit of trace can trigger some idea. At least l2cap has something to do with bluetooth. l2cap_info_timeout is line 360 of net/bluetooth/l2cap.c, apparently only called from l2cap_conn_add, line 391: setup_timer(&conn->info_timer, l2cap_info_timeout, (unsigned long)conn);
>
> After four hours and ten crashes today, it is the little I got. Kernel stuff is tough...
> --
> kk1

Could you try the following patch? I'm not so sure whether put del_timer
here safe or not.

diff -upr linux/net/bluetooth/l2cap.c linux.new/net/bluetooth/l2cap.c
--- linux/net/bluetooth/l2cap.c 2008-02-22 11:11:33.000000000 +0800
+++ linux.new/net/bluetooth/l2cap.c 2008-02-22 11:14:12.000000000 +0800
@@ -418,6 +418,7 @@ static void l2cap_conn_del(struct hci_co
}

hcon->l2cap_data = NULL;
+ del_timer(&conn->info_timer);
kfree(conn);
}

2008-02-22 07:23:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

Quel,

On Fri, 22 Feb 2008, Quel Qun wrote:
> $ addr2line -e vmlinux c012d51d
> /usr/src/linux-2.6.25-rc2-git5kk1/kernel/timer.c:770
>
> Crap, that is on the next list_for_each_entry in timer.c :(
>
> I tried to make a similar test loop as you did a few lines above:

Cool.

> I thought I got it on the next crash, but the system locked too
> fast, and the only thing I saw was:
>
> TTRACE timer f7b52858 fn f8e7c608 addr c012d776
> TTRACE fn l2cap_info_timeout
> TTRACE addr mod_timer
> BUG: unable to handle kernel paging request at 6b6b6b6b

That's what I wanted to see.

> I hope the tiny bit of trace can trigger some idea. At least l2cap
> has something to do with bluetooth. l2cap_info_timeout is line 360
> of net/bluetooth/l2cap.c, apparently only called from
> l2cap_conn_add, line 391: setup_timer(&conn->info_timer,
> l2cap_info_timeout, (unsigned long)conn);

Correct. And I don't see how it's guaranteed that the timer is deleted
before l2cap_conn_del() is called which kfree's the l2cap_conn
structure.

> After four hours and ten crashes today, it is the little I
> got. Kernel stuff is tough...

Yes, it is. The little information you got should be enough to solve
this. Thanks for your patience and help !

Does the patch below fix your problem ?

Thanks,

tglx

---
net/bluetooth/l2cap.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux-2.6/net/bluetooth/l2cap.c
===================================================================
--- linux-2.6.orig/net/bluetooth/l2cap.c
+++ linux-2.6/net/bluetooth/l2cap.c
@@ -417,6 +417,8 @@ static void l2cap_conn_del(struct hci_co
l2cap_sock_kill(sk);
}

+ del_timer(&conn->info_timer);
+
hcon->l2cap_data = NULL;
kfree(conn);
}

2008-02-22 11:34:31

by David Woodhouse

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle


On Fri, 2008-02-22 at 08:23 +0100, Thomas Gleixner wrote:
>
> + del_timer(&conn->info_timer);
> +
> hcon->l2cap_data = NULL;
> kfree(conn);

Shouldn't that be del_timer_sync() ?

--
dwmw2

2008-02-22 11:44:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Fri, 22 Feb 2008, David Woodhouse wrote:

> On Fri, 2008-02-22 at 08:23 +0100, Thomas Gleixner wrote:
> >
> > + del_timer(&conn->info_timer);
> > +
> > hcon->l2cap_data = NULL;
> > kfree(conn);
>
> Shouldn't that be del_timer_sync() ?

Hmm, probably yes.

tglx

2008-02-26 00:03:41

by kelk1

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

-------------- Original message ----------------------
From: Thomas Gleixner <[email protected]>
> On Fri, 22 Feb 2008, David Woodhouse wrote:
>
> > On Fri, 2008-02-22 at 08:23 +0100, Thomas Gleixner wrote:
> > >
> > > + del_timer(&conn->info_timer);
> > > +
> > > hcon->l2cap_data = NULL;
> > > kfree(conn);
> >
> > Shouldn't that be del_timer_sync() ?
>
> Hmm, probably yes.
>
Hi,

Great news: only adding adding del_timer_sync() to 2.6.25-rc3 does prevent the crash.

Bad news: I still cannot use the device.

hcitool inq, hcitool scan, hcitool name <btaddr> and hcitool info <btaddr>
commands work.

hcitool cc <btaddr>, sdptool <btaddr>, rfcomm connect command fail, most of them
with a 'Connection reset by peer' error.

# rpm -q bluez-utils
bluez-utils-3.27-1mdv2008.1

Thank you,
--
Eric

2008-02-26 03:13:54

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

Hi Quel,

> -------------- Original message ----------------------
> From: Thomas Gleixner <[email protected]>
>> On Fri, 22 Feb 2008, David Woodhouse wrote:
>>
>>> On Fri, 2008-02-22 at 08:23 +0100, Thomas Gleixner wrote:
>>>>
>>>> + del_timer(&conn->info_timer);
>>>> +
>>>> hcon->l2cap_data = NULL;
>>>> kfree(conn);
>>>
>>> Shouldn't that be del_timer_sync() ?
>>
>> Hmm, probably yes.
>>
> Hi,
>
> Great news: only adding adding del_timer_sync() to 2.6.25-rc3 does
> prevent the crash.
>
> Bad news: I still cannot use the device.
>
> hcitool inq, hcitool scan, hcitool name <btaddr> and hcitool info
> <btaddr>
> commands work.
>
> hcitool cc <btaddr>, sdptool <btaddr>, rfcomm connect command fail,
> most of them
> with a 'Connection reset by peer' error.

what does "hciconfig hci0 version" tell you about your device? Some of
the none major based Bluetooth chips are broken and might need an
extra tweak within the USB driver.

Regards

Marcel

2008-02-26 08:29:18

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] bluetooth: delete timer in l2cap_conn_del()

Delete a possibly armed timer before kfree'ing the connection object.

Solves: http://lkml.org/lkml/2008/2/15/514

Reported-by:Quel Qun <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

---
net/bluetooth/l2cap.c | 2 ++
1 file changed, 2 insertions(+)

Index: linux-2.6/net/bluetooth/l2cap.c
===================================================================
--- linux-2.6.orig/net/bluetooth/l2cap.c
+++ linux-2.6/net/bluetooth/l2cap.c
@@ -417,6 +417,8 @@ static void l2cap_conn_del(struct hci_co
l2cap_sock_kill(sk);
}

+ del_timer_sync(&conn->info_timer);
+
hcon->l2cap_data = NULL;
kfree(conn);
}

2008-02-26 15:49:47

by kelk1

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

-------------- Original message ----------------------
From: Marcel Holtmann <[email protected]>
> Hi Quel,
>
> > Bad news: I still cannot use the device.
> >
> > hcitool inq, hcitool scan, hcitool name <btaddr> and hcitool info
> > <btaddr>
> > commands work.
> >
> > hcitool cc <btaddr>, sdptool <btaddr>, rfcomm connect command fail,
> > most of them
> > with a 'Connection reset by peer' error.
>
> what does "hciconfig hci0 version" tell you about your device? Some of
> the none major based Bluetooth chips are broken and might need an
> extra tweak within the USB driver.
>

Marcel,

# hciconfig hci0 version
hci0: Type: USB
BD Address: 00:03:0D:00:15:47 ACL MTU: 192:8 SCO MTU: 64:8
HCI Ver: 1.1 (0x1) HCI Rev: 0xbc LMP Ver: 1.1 (0x1) LMP Subver: 0xbc
Manufacturer: Cambridge Silicon Radio (10)

# lsusb | grep Cambridge
Bus 003 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)

This device works fine in 2.6.23.1 and got broken circa 2.6.24 rcs.

Thank you,
--
kk1

2008-02-26 19:38:40

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

Hi Quel,

> Delete a possibly armed timer before kfree'ing the connection object.
>
> Solves: http://lkml.org/lkml/2008/2/15/514
>
> Reported-by:Quel Qun <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> ---
> net/bluetooth/l2cap.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> Index: linux-2.6/net/bluetooth/l2cap.c
> ===================================================================
> --- linux-2.6.orig/net/bluetooth/l2cap.c
> +++ linux-2.6/net/bluetooth/l2cap.c
> @@ -417,6 +417,8 @@ static void l2cap_conn_del(struct hci_co
> l2cap_sock_kill(sk);
> }
>
> + del_timer_sync(&conn->info_timer);
> +
> hcon->l2cap_data = NULL;
> kfree(conn);
> }

can you confirm that this actually fixes the issue.

Thomas, if confirmed, this is Acked-by me.

Regards

Marcel

2008-02-27 01:42:38

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

From: Thomas Gleixner <[email protected]>
Date: Tue, 26 Feb 2008 09:28:13 +0100 (CET)

> Delete a possibly armed timer before kfree'ing the connection object.
>
> Solves: http://lkml.org/lkml/2008/2/15/514
>
> Reported-by:Quel Qun <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>

I'll apply this, thanks Thomas.

2008-02-27 09:55:26

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

Hi Dave,

> From: Thomas Gleixner <[email protected]>
> Date: Tue, 26 Feb 2008 09:28:13 +0100 (CET)
>
>> Delete a possibly armed timer before kfree'ing the connection object.
>>
>> Solves: http://lkml.org/lkml/2008/2/15/514
>>
>> Reported-by:Quel Qun <[email protected]>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>
> I'll apply this, thanks Thomas.

can you please wait for a confirmation from Quel that this fixes it.
My ACK is based on that he confirms that it fixes it for sure.

Regards

Marcel

2008-02-27 12:21:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

On Wed, 27 Feb 2008, Marcel Holtmann wrote:

> Hi Dave,
>
> > From: Thomas Gleixner <[email protected]>
> > Date: Tue, 26 Feb 2008 09:28:13 +0100 (CET)
> >
> > > Delete a possibly armed timer before kfree'ing the connection object.
> > >
> > > Solves: http://lkml.org/lkml/2008/2/15/514
> > >
> > > Reported-by:Quel Qun <[email protected]>
> > > Signed-off-by: Thomas Gleixner <[email protected]>
> >
> > I'll apply this, thanks Thomas.
>
> can you please wait for a confirmation from Quel that this fixes it. My ACK is
> based on that he confirms that it fixes it for sure.

http://lkml.org/lkml/2008/2/25/463

Thanks,

tglx

2008-02-27 19:08:17

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

From: Marcel Holtmann <[email protected]>
Date: Wed, 27 Feb 2008 10:55:07 +0100

> Hi Dave,
>
> > From: Thomas Gleixner <[email protected]>
> > Date: Tue, 26 Feb 2008 09:28:13 +0100 (CET)
> >
> >> Delete a possibly armed timer before kfree'ing the connection object.
> >>
> >> Solves: http://lkml.org/lkml/2008/2/15/514
> >>
> >> Reported-by:Quel Qun <[email protected]>
> >> Signed-off-by: Thomas Gleixner <[email protected]>
> >
> > I'll apply this, thanks Thomas.
>
> can you please wait for a confirmation from Quel that this fixes it.
> My ACK is based on that he confirms that it fixes it for sure.

It doesn't hurt to toss this to Linus now, if it's bogus we
have tons of time to revert it.

2008-02-27 20:22:16

by kelk1

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()


-------------- Original message ----------------------
From: David Miller <[email protected]>
> From: Marcel Holtmann <[email protected]>
> Date: Wed, 27 Feb 2008 10:55:07 +0100
>
> > Hi Dave,
> >
> > > From: Thomas Gleixner <[email protected]>
> > > Date: Tue, 26 Feb 2008 09:28:13 +0100 (CET)
> > >
> > >> Delete a possibly armed timer before kfree'ing the connection object.
> > >>
> > >> Solves: http://lkml.org/lkml/2008/2/15/514
> > >>
> > >> Reported-by:Quel Qun <[email protected]>
> > >> Signed-off-by: Thomas Gleixner <[email protected]>
> > >
> > > I'll apply this, thanks Thomas.
> >
> > can you please wait for a confirmation from Quel that this fixes it.
> > My ACK is based on that he confirms that it fixes it for sure.
>
> It doesn't hurt to toss this to Linus now, if it's bogus we
> have tons of time to revert it.

As I said, it prevents the crash, but does not 'fix' my problem, in that I still cannot use the dongle.
--
kk1

2008-02-27 20:31:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

On Wed, 27 Feb 2008, Quel Qun wrote:
> > > > I'll apply this, thanks Thomas.
> > >
> > > can you please wait for a confirmation from Quel that this fixes it.
> > > My ACK is based on that he confirms that it fixes it for sure.
> >
> > It doesn't hurt to toss this to Linus now, if it's bogus we
> > have tons of time to revert it.
>
> As I said, it prevents the crash, but does not 'fix' my problem, in
> that I still cannot use the dongle.

The disfunctionality of your dongle is a separate problem and I hope
that the bluetooth wizards will help you to get this sucker running.

The timer which is not deactivated before the data structure
containing it is a simple bug, which needs to be addressed ASAP.

Thanks again for your patience in tracking this down,

tglx

2008-02-27 22:08:47

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()


Marcel/Dave, can you or someone else followup with
the bug reportor to get their dongle working now that
the OOPS problem is fixed?

Let's not lose track of this bug, thanks.

2008-02-28 01:04:20

by Dave Young

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

On Tue, Feb 26, 2008 at 8:03 AM, Quel Qun <[email protected]> wrote:
> -------------- Original message ----------------------
> From: Thomas Gleixner <[email protected]>
>
>
> > On Fri, 22 Feb 2008, David Woodhouse wrote:
> >
> > > On Fri, 2008-02-22 at 08:23 +0100, Thomas Gleixner wrote:
> > > >
> > > > + del_timer(&conn->info_timer);
> > > > +
> > > > hcon->l2cap_data = NULL;
> > > > kfree(conn);
> > >
> > > Shouldn't that be del_timer_sync() ?
> >
> > Hmm, probably yes.
> >
> Hi,
>
> Great news: only adding adding del_timer_sync() to 2.6.25-rc3 does prevent the crash.
>
> Bad news: I still cannot use the device.
>
> hcitool inq, hcitool scan, hcitool name <btaddr> and hcitool info <btaddr>
> commands work.
>
> hcitool cc <btaddr>, sdptool <btaddr>, rfcomm connect command fail, most of them
> with a 'Connection reset by peer' error.

Could you send the dmesg and hcidump content while connecting (eg.
rfcomm connect)?

>
> # rpm -q bluez-utils
> bluez-utils-3.27-1mdv2008.1
>
> Thank you,
> --
> Eric
>

2008-02-28 01:17:48

by kelk1

[permalink] [raw]
Subject: Re: [PATCH] bluetooth: delete timer in l2cap_conn_del()

-------------- Original message ----------------------
From: David Miller <[email protected]>
>
> Marcel/Dave, can you or someone else followup with
> the bug reporter to get their dongle working now that
> the OOPS problem is fixed?
>
> Let's not lose track of this bug, thanks.

I entered #10126

http://bugzilla.kernel.org/show_bug.cgi?id=10126

Thank you,
--
kk1

2008-02-28 01:38:17

by Dave Young

[permalink] [raw]
Subject: Re: Kernel oops with bluetooth usb dongle

add davem to cc-list

On Thu, Feb 28, 2008 at 9:03 AM, Dave Young <[email protected]> wrote:
>
> On Tue, Feb 26, 2008 at 8:03 AM, Quel Qun <[email protected]> wrote:
> > -------------- Original message ----------------------
> > From: Thomas Gleixner <[email protected]>
> >
> >
> > > On Fri, 22 Feb 2008, David Woodhouse wrote:
> > >
> > > > On Fri, 2008-02-22 at 08:23 +0100, Thomas Gleixner wrote:
> > > > >
> > > > > + del_timer(&conn->info_timer);
> > > > > +
> > > > > hcon->l2cap_data = NULL;
> > > > > kfree(conn);
> > > >
> > > > Shouldn't that be del_timer_sync() ?
> > >
> > > Hmm, probably yes.
> > >
> > Hi,
> >
> > Great news: only adding adding del_timer_sync() to 2.6.25-rc3 does prevent the crash.
> >
> > Bad news: I still cannot use the device.
> >
> > hcitool inq, hcitool scan, hcitool name <btaddr> and hcitool info <btaddr>
> > commands work.
> >
> > hcitool cc <btaddr>, sdptool <btaddr>, rfcomm connect command fail, most of them
> > with a 'Connection reset by peer' error.
>
> Could you send the dmesg and hcidump content while connecting (eg.
> rfcomm connect)?
>
>
>
> >
> > # rpm -q bluez-utils
> > bluez-utils-3.27-1mdv2008.1
> >
> > Thank you,
> > --
> > Eric
> >
>