2006-12-23 23:44:06

by Pavel Machek

[permalink] [raw]
Subject: ext3-related crash in 2.6.20-rc1

Hi!

I got this nasty oops while playing with debugger. Not sure if that is
related; it also might be something with bluetooth; I already know it
corrupts memory during suspend, perhaps it corrupts memory in some
error path?



Pavel


l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
PM: Removing info for bluetooth:acl00803715A329
e1000: eth0: e1000_watchdog: NIC Link is Down
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
e1000: eth0: e1000_watchdog: NIC Link is Down
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
------------[ cut here ]------------
kernel BUG at fs/buffer.c:1235!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c01933c2>] Not tainted VLI
EFLAGS: 00010046 (2.6.20-rc1 #379)
EIP is at __find_get_block+0x1b2/0x1c0
eax: 00000086 ebx: 00001000 ecx: 00000000 edx: 006780b2
esi: 0033d60d edi: 00001000 ebp: 000000cf esp: c9135c90
ds: 007b es: 007b ss: 0068
Process phone (pid: 1161, ti=c9134000 task=f7949030 task.ti=c9134000)
Stack: 006780b2 00000000 f7ec2820 00000003 ad40ad40 f7d8f5ba c0652a48 00000000
f88da000 00000012 0000000f f65c9000 f65c9230 c016c9df c016c3c4 00001000
0033d60d 00001000 000000cf c01933ef 00001000 5a0ff380 0000007f f65c9234
Call Trace:
[<c01933ef>] __getblk+0x1f/0x290
[<c01db680>] __ext3_get_inode_loc+0x120/0x3a0
[<c01db9d7>] ext3_reserve_inode_write+0x27/0x80
[<c01dbe1a>] ext3_mark_inode_dirty+0x1a/0x40
[<c01dc2c9>] ext3_dirty_inode+0x79/0xb0
[<c018c854>] __mark_inode_dirty+0x34/0x1c0
[<c0154934>] __generic_file_aio_write_nolock+0x244/0x590
[<c0154cd9>] generic_file_aio_write+0x59/0xd0
[<c01da050>] ext3_file_write+0x30/0xc0
[<c0170ad7>] do_sync_write+0xc7/0x130
[<c0171266>] vfs_write+0xa6/0x160
[<c0171b21>] sys_write+0x41/0x70
[<c010304c>] syscall_call+0x7/0xb
[<b7f18d2e>] 0xb7f18d2e
=======================
Code: 00 8b 7c 24 18 f3 a5 fb 8b 44 24 10 85 c0 0f 84 2c ff ff ff 8b 44 24 10 e8 5c ca ff ff e9 1e ff ff ff 89 d8 e8 50 ca ff ff eb 8d <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 89 f6 55 57 56 53 83 ec 48
EIP: [<c01933c2>] __find_get_block+0x1b2/0x1c0 SS:ESP 0068:c9135c90


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2006-12-23 23:55:18

by Pavel Machek

[permalink] [raw]
Subject: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi!

> I got this nasty oops while playing with debugger. Not sure if that is
> related; it also might be something with bluetooth; I already know it
> corrupts memory during suspend, perhaps it corrupts memory in some
> error path?

Okay, I spoke too soon. bluetooth & suspend memory corruption was
_way_ harder to reproduce than expected. Took me 5-or-so-suspend
cycles... so it is probably unrelated to the previous crash.

I was getting pretty regular crashes with bluetooth & gdb, but I was
not using bluetooth at the time of ext3-related crash.

Pavel

acpi acpi: resuming
__tx_submit: hci0 tx submit failed urb c20efb08 type 2 err -113
agpgart-intel 0000:00:00.0: resuming
pci 0000:00:02.0: resuming
pci 0000:00:02.1: resuming
PM: Writing back config space on device 0000:00:02.1 at offset 1 (was 900000, writing 900003)
HDA Intel 0000:00:1b.0: resuming
PM: Writing back config space on device 0000:00:1b.0 at offset 1 (was 100106, writing 100102)
PCI: Setting latency timer of device 0000:00:1b.0 to 64
pci 0000:00:1c.0: resuming
PCI: Setting latency timer of device 0000:00:1c.0 to 64
pci 0000:00:1c.1: resuming
PCI: Setting latency timer of device 0000:00:1c.1 to 64
pci 0000:00:1c.2: resuming
PCI: Setting latency timer of device 0000:00:1c.2 to 64
pci 0000:00:1c.3: resuming
PM: Writing back config space on device 0000:00:1c.3 at offset f (was 40400, writing 4040b)
PM: Writing back config space on device 0000:00:1c.3 at offset 9 (was 10001, writing e421e421)
PM: Writing back config space on device 0000:00:1c.3 at offset 8 (was 0, writing ebf0ea00)
PM: Writing back config space on device 0000:00:1c.3 at offset 7 (was 20000000, writing 8070)
PM: Writing back config space on device 0000:00:1c.3 at offset 3 (was 810000, writing 810010)
PM: Writing back config space on device 0000:00:1c.3 at offset 1 (was 100000, writing 100107)
PCI: Setting latency timer of device 0000:00:1c.3 to 64
uhci_hcd 0000:00:1d.0: resuming
PCI: Setting latency timer of device 0000:00:1d.0 to 64
usb usb4: root hub lost power or was reset
uhci_hcd 0000:00:1d.1: resuming
PCI: Setting latency timer of device 0000:00:1d.1 to 64
usb usb2: root hub lost power or was reset
uhci_hcd 0000:00:1d.2: resuming
PCI: Setting latency timer of device 0000:00:1d.2 to 64
usb usb5: root hub lost power or was reset
uhci_hcd 0000:00:1d.3: resuming
PCI: Setting latency timer of device 0000:00:1d.3 to 64
usb usb3: root hub lost power or was reset
ehci_hcd 0000:00:1d.7: resuming
PCI: Setting latency timer of device 0000:00:1d.7 to 64
pci 0000:00:1e.0: resuming
PM: Writing back config space on device 0000:00:1e.0 at offset 1 (was 100005, writing 100007)
PCI: Setting latency timer of device 0000:00:1e.0 to 64
pci 0000:00:1f.0: resuming
PIIX_IDE 0000:00:1f.1: resuming
ahci 0000:00:1f.2: resuming
PCI: Setting latency timer of device 0000:00:1f.2 to 64
pci 0000:00:1f.3: resuming
pci 0000:02:00.0: resuming
PM: Writing back config space on device 0000:02:00.0 at offset 1 (was 100107, writing 100103)
pci 0000:03:00.0: resuming
yenta_cardbus 0000:15:00.0: resuming
ohci1394 0000:15:00.1: resuming
PM: Writing back config space on device 0000:15:00.1 at offset 4 (was 0, writing e4301000)
PM: Writing back config space on device 0000:15:00.1 at offset 3 (was 800000, writing 804000)
PM: Writing back config space on device 0000:15:00.1 at offset 1 (was 2100000, writing 2100006)
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[21] MMIO=[e4301000-e43017ff] Max Packet=[2048] IR/IT contexts=[4/4]
sdhci 0000:15:00.2: resuming
PM: Writing back config space on device 0000:15:00.2 at offset 4 (was 0, writing e4301800)
PM: Writing back config space on device 0000:15:00.2 at offset 3 (was 800000, writing 804000)
PM: Writing back config space on device 0000:15:00.2 at offset 1 (was 2100000, writing 2100006)
system 00:00: resuming
pnp 00:01: resuming
system 00:02: resuming
pnp 00:03: resuming
pnp 00:04: resuming
pnp 00:05: resuming
pnp 00:06: resuming
pnp 00:07: resuming
i8042 kbd 00:08: resuming
pnp: Device 00:08 does not support activation.
i8042 aux 00:09: resuming
pnp: Device 00:09 does not support activation.
pnp 00:0a: resuming
pnp 00:0b: resuming
platform bluetooth: resuming
pcspkr pcspkr: resuming
vesafb vesafb.0: resuming
serial8250 serial8250: resuming
usb usb1: resuming
hub 1-0:1.0: resuming
usb usb2: resuming
hub 2-0:1.0: resuming
usb usb4: resuming
ata2: SATA link down (SStatus 0 SControl 0)
ata3: SATA link down (SStatus 0 SControl 0)
ata4: SATA link down (SStatus 0 SControl 0)
usb usb5: resuming
hub 4-0:1.0: resuming
hub 5-0:1.0: resuming
usb usb3: resuming
hub 3-0:1.0: resuming
i8042 i8042: resuming
atkbd serio0: resuming
psmouse serio1: resuming
mmcblk mmc0:cc53: resuming
sd 0:0:0:0: resuming
usb 3-2: resuming
usbdev3.14_ep00: PM: resume from 0, parent 3-2 still 2
usb 3-2:1.0: PM: resume from 2, parent 3-2 still 2
usb 3-2:1.0: resuming
usbdev3.14_ep81: PM: resume from 0, parent 3-2:1.0 still 2
usbdev3.14_ep02: PM: resume from 0, parent 3-2:1.0 still 2
usbdev3.14_ep83: PM: resume from 0, parent 3-2:1.0 still 2
usb 3-1: resuming
usbdev3.15_ep00: PM: resume from 0, parent 3-1 still 2
hci_usb 3-1:1.0: PM: resume from 2, parent 3-1 still 2
hci_usb 3-1:1.0: resuming
hci0: PM: resume from 0, parent 3-1:1.0 still 2
usbdev3.15_ep81: PM: resume from 0, parent 3-1:1.0 still 2
usbdev3.15_ep82: PM: resume from 0, parent 3-1:1.0 still 2
usbdev3.15_ep02: PM: resume from 0, parent 3-1:1.0 still 2
hci_usb 3-1:1.1: PM: resume from 2, parent 3-1 still 2
hci_usb 3-1:1.1: resuming
usbdev3.15_ep83: PM: resume from 0, parent 3-1:1.1 still 2
usbdev3.15_ep03: PM: resume from 0, parent 3-1:1.1 still 2
usb 3-1:1.2: PM: resume from 2, parent 3-1 still 2
usb 3-1:1.2: resuming
usbdev3.15_ep84: PM: resume from 0, parent 3-1:1.2 still 2
usbdev3.15_ep04: PM: resume from 0, parent 3-1:1.2 still 2
usb 3-1:1.3: PM: resume from 2, parent 3-1 still 2
usb 3-1:1.3: resuming
Restarting tasks ... <6>usb 3-1: USB disconnect, address 15
PM: Removing info for No Bus:usbdev3.15_ep81
PM: Removing info for No Bus:usbdev3.15_ep82
PM: Removing info for No Bus:usbdev3.15_ep02
slab error in verify_redzone_free(): cache `size-512': memory outside object was overwritten
[<c016a1b8>] cache_free_debugcheck+0x128/0x1d0
[<c04b58e3>] hci_usb_close+0xf3/0x160
[<c016b530>] kfree+0x50/0xa0
[<c04b58e3>] hci_usb_close+0xf3/0x160
[<c04b59be>] hci_usb_disconnect+0x2e/0x90
[<c0454f23>] usb_disable_interface+0x53/0x70
[<c04576f8>] usb_unbind_interface+0x38/0x80
[<c032f908>] __device_release_driver+0x68/0xb0
[<c032fc3e>] device_release_driver+0x1e/0x40
[<c032f1db>] bus_remove_device+0x8b/0xa0
[<c032dbc9>] device_del+0x159/0x1c0
[<c04559ad>] usb_disable_device+0x4d/0x100
[<c044fe8a>] usb_disconnect+0x9a/0x110
[<c0452405>] hub_thread+0x355/0xbd0
[<c061426e>] schedule+0x2de/0x8f0
[<c013c640>] autoremove_wake_function+0x0/0x50
[<c04520b0>] hub_thread+0x0/0xbd0
[<c013c58c>] kthread+0xec/0xf0
[<c013c4a0>] kthread+0x0/0xf0
[<c0103be7>] kernel_thread_helper+0x7/0x10
=======================
e91f6288: redzone 1:0x5a5a5a5a, redzone 2:0xc054aeae.
------------[ cut here ]------------
kernel BUG at mm/slab.c:2878!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c016a242>] Not tainted VLI
EFLAGS: 00010002 (2.6.20-rc1 #383)
EIP is at cache_free_debugcheck+0x1b2/0x1d0
eax: e91f6284 ebx: e91f6078 ecx: 00052c00 edx: 0000020c
esi: c20df680 edi: e91f6288 ebp: 5a5a5a5a esp: c2227e30
ds: 007b es: 007b ss: 0068
Process khubd (pid: 303, ti=c2226000 task=c21f6a70 task.ti=c2226000)
Stack: c06b3fe0 e91f6288 5a5a5a5a c054aeae c04b58e3 e91f6040 c20df680 c20d9164
e91f628c 00000282 c016b530 c20efb08 c20efaf4 e977a274 0000000c c04b58e3
e977a230 e977a260 f7b3f904 e977a1a4 00000001 e977a1a4 f7b3f904 c07e2060
Call Trace:
[<c054aeae>] sock_alloc_send_skb+0x16e/0x1c0
[<c04b58e3>] hci_usb_close+0xf3/0x160
[<c016b530>] kfree+0x50/0xa0
[<c04b58e3>] hci_usb_close+0xf3/0x160
[<c04b59be>] hci_usb_disconnect+0x2e/0x90
[<c0454f23>] usb_disable_interface+0x53/0x70
[<c04576f8>] usb_unbind_interface+0x38/0x80
[<c032f908>] __device_release_driver+0x68/0xb0
[<c032fc3e>] device_release_driver+0x1e/0x40
[<c032f1db>] bus_remove_device+0x8b/0xa0
[<c032dbc9>] device_del+0x159/0x1c0
[<c04559ad>] usb_disable_device+0x4d/0x100
[<c044fe8a>] usb_disconnect+0x9a/0x110
[<c0452405>] hub_thread+0x355/0xbd0
[<c061426e>] schedule+0x2de/0x8f0
[<c013c640>] autoremove_wake_function+0x0/0x50
[<c04520b0>] hub_thread+0x0/0xbd0
[<c013c58c>] kthread+0xec/0xf0
[<c013c4a0>] kthread+0x0/0xf0
[<c0103be7>] kernel_thread_helper+0x7/0x10
=======================
Code: f0 2c 5a 75 8b b9 39 31 6b c0 89 f2 b8 88 e8 61 c0 e8 73 f4 ff ff eb 89 81 fb a5 c2 0f 17 0f 85 6c ff ff ff 90 8d 74 26 00 eb 8e <0f> 0b eb fe 0f 0b eb fe 8d b6 00 00 00 00 0f 0b eb fe 8b 52 0c
EIP: [<c016a242>] cache_free_debugcheck+0x1b2/0x1d0 SS:ESP 0068:c2227e30
<7>PM: Adding info for No Bus:vcs63
PM: Adding info for No Bus:vcsa63
PM: Removing info for No Bus:vcs63
PM: Removing info for No Bus:vcsa63
done.
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
Initializing CPU#1
Calibrating delay using timer specific routine.. 3657.63 BogoMIPS (lpj=18288162)
CPU: After generic identify, caps: bfe9fbff 00100000 00000000 00000000 0000c1a9 00000000 00000000
monitor/mwait feature present.
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU: After all inits, caps: bfe9fbff 00100000 00000000 00002940 0000c1a9 00000000 00000000
CPU1: Intel Genuine Intel(R) CPU T2400 @ 1.83GHz stepping 08
PM: Adding info for No Bus:msr1
CPU1 is up
ata1: waiting for device to spin up (7 secs)


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-12-24 00:02:09

by Pavel Machek

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi!

> > I got this nasty oops while playing with debugger. Not sure if that is
> > related; it also might be something with bluetooth; I already know it
> > corrupts memory during suspend, perhaps it corrupts memory in some
> > error path?
>
> Okay, I spoke too soon. bluetooth & suspend memory corruption was
> _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> cycles... so it is probably unrelated to the previous crash.
>
> I was getting pretty regular crashes with bluetooth & gdb, but I was
> not using bluetooth at the time of ext3-related crash.

And for completeness, here's bluetooth + gdb oops. Ok, I'm not _sure_
it is bluetooth related. I'll try it without bluetooth in a while.

Pavel

PM: Adding info for No Bus:vcsa8
coda_read_super: Bad mount data
coda_read_super: device index: 0
coda_read_super: rootfid is (01234567.ffffffff.080519b0.00000000)
PM: Removing info for No Bus:vcs10
PM: Removing info for No Bus:vcsa10
coda_upcall: Venus dead on (op,un) (7.2) flags 10
Failure of coda_cnode_make for root: error -19
hci_cmd_task: hci0 command tx timeout
PM: Adding info for No Bus:rfcomm1
PM: Adding info for bluetooth:acl00803715A329
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
hci_acldata_packet: hci0 ACL packet for unknown connection handle 12
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
l2cap_recv_acldata: Unexpected continuation frame (len 0)
PM: Removing info for bluetooth:acl00803715A329
------------[ cut here ]------------
kernel BUG at fs/buffer.c:1235!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0060:[<c01912b2>] Not tainted VLI
EFLAGS: 00010046 (2.6.20-rc1 #383)
EIP is at __find_get_block+0x1b2/0x1c0
eax: 00000086 ebx: 00001000 ecx: 00000000 edx: 006780b2
esi: 0033d60d edi: 00001000 ebp: 000000cf esp: f75a3c90
ds: 007b es: 007b ss: 0068
Process phone (pid: 1795, ti=f75a2000 task=c2287030 task.ti=f75a2000)
Stack: 006780b2 00000000 c21e9a08 00000003 ad40ad40 f7d8d1dc c0629908 00000000
f89fa000 00000012 00000002 00000003 ad55ad55 f7d8d182 c0629908 00001000
0033d60d 00001000 000000cf c01912df 00001000 f7dbf74c 00000000 00000008
Call Trace:
[<c01912df>] __getblk+0x1f/0x290
[<c016a284>] check_poison_obj+0x24/0x1a0
[<c0280115>] soft_cursor+0x175/0x1e0
[<c01b1ad0>] __ext3_get_inode_loc+0x120/0x3a0
[<c016954e>] dbg_redzone1+0xe/0x20
[<c016a43e>] cache_alloc_debugcheck_after+0x3e/0x150
[<c01c1703>] journal_start+0x83/0xe0
[<c01b1e27>] ext3_reserve_inode_write+0x27/0x80
[<c01b226a>] ext3_mark_inode_dirty+0x1a/0x40
[<c01b2719>] ext3_dirty_inode+0x79/0xb0
[<c018a744>] __mark_inode_dirty+0x34/0x1c0
[<c0181a59>] file_update_time+0x39/0xa0
[<c0152984>] __generic_file_aio_write_nolock+0x244/0x590
[<c0120fad>] __wake_up_sync+0x3d/0x60
[<c06154df>] __mutex_lock_slowpath+0xef/0x230
[<c0152d29>] generic_file_aio_write+0x59/0xd0
[<c01b04a0>] ext3_file_write+0x30/0xc0
[<c016e997>] do_sync_write+0xc7/0x130
[<c013c640>] autoremove_wake_function+0x0/0x50
[<c015f2e9>] remove_vma+0x39/0x50
[<c016f126>] vfs_write+0xa6/0x160
[<c016e8d0>] do_sync_write+0x0/0x130
[<c016f9e1>] sys_write+0x41/0x70
[<c010304c>] syscall_call+0x7/0xb
=======================
Code: 00 8b 7c 24 18 f3 a5 fb 8b 44 24 10 85 c0 0f 84 2c ff ff ff 8b 44 24 10 e8 5c ca ff ff e9 1e ff ff ff 89 d8 e8 50 ca ff ff eb 8d <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 89 f6 55 57 56 53 83 ec 48
EIP: [<c01912b2>] __find_get_block+0x1b2/0x1c0 SS:ESP 0068:f75a3c90


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-12-24 00:06:24

by Pavel Machek

[permalink] [raw]
Subject: not-only-bluetooth memory corruption

On Sun 2006-12-24 01:01:50, Pavel Machek wrote:
> Hi!
>
> > > I got this nasty oops while playing with debugger. Not sure if that is
> > > related; it also might be something with bluetooth; I already know it
> > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > error path?
> >
> > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > cycles... so it is probably unrelated to the previous crash.
> >
> > I was getting pretty regular crashes with bluetooth & gdb, but I was
> > not using bluetooth at the time of ext3-related crash.
>
> And for completeness, here's bluetooth + gdb oops. Ok, I'm not _sure_
> it is bluetooth related. I'll try it without bluetooth in a while.

Ok, so this one is not bluetooth related. My little "phone"
application provokes nasty oops, even when talking to
/dev/null. Strange, that code does _nothing_
strange. (http://www.sf.net/projects/tui).

Is there something wrong with gdb?

Pavel
pcmcia: Detected deprecated PCMCIA ioctl usage from process: cardmgr.
pcmcia: This interface will soon be removed from the kernel; please expect breakage unless you upgrade to new tools.
pcmcia: see http://www.kernel.org/pub/linux/utils/kernel/pcmcia/pcmcia.html for details.
cs: IO port probe 0x310-0x380: clean.
cs: IO port probe 0xa00-0xaff: clean.
PM: Adding info for No Bus:vcs10
PM: Adding info for No Bus:vcsa10
PM: Removing info for No Bus:vcs10
PM: Removing info for No Bus:vcsa10
PM: Adding info for No Bus:vcs10
PM: Adding info for No Bus:vcsa10
PM: Removing info for No Bus:vcs10
PM: Removing info for No Bus:vcsa10
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
PM: Adding info for No Bus:vcs11
PM: Adding info for No Bus:vcsa11
PM: Adding info for No Bus:vcs2
PM: Adding info for No Bus:vcs3
PM: Adding info for No Bus:vcs4
PM: Adding info for No Bus:vcsa2
PM: Adding info for No Bus:vcs5
PM: Adding info for No Bus:vcs6
PM: Adding info for No Bus:vcs7
PM: Adding info for No Bus:vcs8
PM: Adding info for No Bus:vcsa3
PM: Adding info for No Bus:vcsa4
PM: Adding info for No Bus:vcsa5
PM: Adding info for No Bus:vcsa6
PM: Adding info for No Bus:vcsa7
PM: Adding info for No Bus:vcsa8
coda_read_super: Bad mount data
coda_read_super: device index: 0
coda_read_super: No pseudo device
PM: Removing info for No Bus:vcs1
PM: Removing info for No Bus:vcsa1
PM: Adding info for No Bus:vcs1
PM: Adding info for No Bus:vcsa1
PM: Removing info for No Bus:vcs1
PM: Removing info for No Bus:vcsa1
PM: Adding info for No Bus:vcs1
PM: Adding info for No Bus:vcsa1
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
------------[ cut here ]------------
kernel BUG at fs/buffer.c:1235!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 1
EIP: 0060:[<c01912b2>] Not tainted VLI
EFLAGS: 00010046 (2.6.20-rc1 #383)
EIP is at __find_get_block+0x1b2/0x1c0
eax: 00000086 ebx: 00001000 ecx: 00000000 edx: 006780b2
esi: 0033d60d edi: 00001000 ebp: 000000cf esp: f6de1c90
ds: 007b es: 007b ss: 0068
Process phone (pid: 1847, ti=f6de0000 task=c2329550 task.ti=f6de0000)
Stack: 006780b2 00000000 c20eb5f8 00000003 05950595 c236d8d2 c0629908 00000000
f88da000 00000012 00000002 00000003 05800580 c236d878 c016a284 00001000
0033d60d 00001000 000000cf c01912df 00001000 5a0df380 0000007f f77b9c0c
Call Trace:
[<c016a284>] check_poison_obj+0x24/0x1a0
[<c01912df>] __getblk+0x1f/0x290
[<c01312e0>] lock_timer_base+0x20/0x50
[<c01317f0>] __mod_timer+0x90/0xa0
[<c01b1ad0>] __ext3_get_inode_loc+0x120/0x3a0
[<c016954e>] dbg_redzone1+0xe/0x20
[<c016a43e>] cache_alloc_debugcheck_after+0x3e/0x150
[<c01c1703>] journal_start+0x83/0xe0
[<c01b1e27>] ext3_reserve_inode_write+0x27/0x80
[<c01b226a>] ext3_mark_inode_dirty+0x1a/0x40
[<c01b2719>] ext3_dirty_inode+0x79/0xb0
[<c018a744>] __mark_inode_dirty+0x34/0x1c0
[<c0181a59>] file_update_time+0x39/0xa0
[<c0152984>] __generic_file_aio_write_nolock+0x244/0x590
[<c06154df>] __mutex_lock_slowpath+0xef/0x230
[<c0152d29>] generic_file_aio_write+0x59/0xd0
[<c0103aa4>] apic_timer_interrupt+0x28/0x30
[<c01b04a0>] ext3_file_write+0x30/0xc0
[<c016e997>] do_sync_write+0xc7/0x130
[<c013c640>] autoremove_wake_function+0x0/0x50
[<c015f2e9>] remove_vma+0x39/0x50
[<c016f126>] vfs_write+0xa6/0x160
[<c016e8d0>] do_sync_write+0x0/0x130
[<c016f9e1>] sys_write+0x41/0x70
[<c010304c>] syscall_call+0x7/0xb
=======================
Code: 00 8b 7c 24 18 f3 a5 fb 8b 44 24 10 85 c0 0f 84 2c ff ff ff 8b 44 24 10 e8 5c ca ff ff e9 1e ff ff ff 89 d8 e8 50 ca ff ff eb 8d <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 89 f6 55 57 56 53 83 ec 48
EIP: [<c01912b2>] __find_get_block+0x1b2/0x1c0 SS:ESP 0068:f6de1c90



--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (4.90 kB)
oops4.bz2 (11.86 kB)
Download all attachments

2006-12-24 00:08:07

by Pavel Machek

[permalink] [raw]
Subject: ptrace() memory corruption?

On Sun 2006-12-24 01:06:05, Pavel Machek wrote:
> On Sun 2006-12-24 01:01:50, Pavel Machek wrote:
> > Hi!
> >
> > > > I got this nasty oops while playing with debugger. Not sure if that is
> > > > related; it also might be something with bluetooth; I already know it
> > > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > > error path?
> > >
> > > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > > cycles... so it is probably unrelated to the previous crash.
> > >
> > > I was getting pretty regular crashes with bluetooth & gdb, but I was
> > > not using bluetooth at the time of ext3-related crash.
> >
> > And for completeness, here's bluetooth + gdb oops. Ok, I'm not _sure_
> > it is bluetooth related. I'll try it without bluetooth in a while.
>
> Ok, so this one is not bluetooth related. My little "phone"
> application provokes nasty oops, even when talking to
> /dev/null. Strange, that code does _nothing_
> strange. (http://www.sf.net/projects/tui).
>
> Is there something wrong with gdb?

Yep. If I do gdb /bin/bash, run; I'll get similar oops. Am I alone
seeing this?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-12-24 01:13:30

by Andrew Morton

[permalink] [raw]
Subject: Re: ext3-related crash in 2.6.20-rc1

On Sun, 24 Dec 2006 00:43:05 +0100
Pavel Machek <[email protected]> wrote:

> Hi!
>
> I got this nasty oops while playing with debugger. Not sure if that is
> related; it also might be something with bluetooth; I already know it
> corrupts memory during suspend, perhaps it corrupts memory in some
> error path?
>
>
>
> Pavel
>
>
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> l2cap_recv_acldata: Unexpected continuation frame (len 0)
> PM: Removing info for bluetooth:acl00803715A329
> e1000: eth0: e1000_watchdog: NIC Link is Down
> e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
> e1000: eth0: e1000_watchdog: NIC Link is Down
> e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
> e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
> ------------[ cut here ]------------
> kernel BUG at fs/buffer.c:1235!

get thee to fs/buffer.c:1235. You'll see that someone somewhere forgot to
reenable local interrupts.

Were you using gdb at the time? A fix for something like that was merged
into mainline yesterday.

The slab errors which you're reporting in later emails will almost surely
be unrelated to this.

2006-12-24 01:13:46

by Jiri Slaby

[permalink] [raw]
Subject: Re: ptrace() memory corruption?

Pavel Machek wrote:
>> Is there something wrong with gdb?
>
> Yep. If I do gdb /bin/bash, run; I'll get similar oops. Am I alone
> seeing this?

Nope, I have this nasty thing here too and will post oopses in the afternoon,
just before Jezisek comes :).

regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-12-24 01:18:37

by Andrew Morton

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

On Sun, 24 Dec 2006 00:55:01 +0100
Pavel Machek <[email protected]> wrote:

> PM: Removing info for No Bus:usbdev3.15_ep81
> PM: Removing info for No Bus:usbdev3.15_ep82
> PM: Removing info for No Bus:usbdev3.15_ep02
> slab error in verify_redzone_free(): cache `size-512': memory outside object was overwritten
> [<c016a1b8>] cache_free_debugcheck+0x128/0x1d0
> [<c04b58e3>] hci_usb_close+0xf3/0x160
> [<c016b530>] kfree+0x50/0xa0
> [<c04b58e3>] hci_usb_close+0xf3/0x160
> [<c04b59be>] hci_usb_disconnect+0x2e/0x90
> [<c0454f23>] usb_disable_interface+0x53/0x70
> [<c04576f8>] usb_unbind_interface+0x38/0x80
> [<c032f908>] __device_release_driver+0x68/0xb0
> [<c032fc3e>] device_release_driver+0x1e/0x40
> [<c032f1db>] bus_remove_device+0x8b/0xa0
> [<c032dbc9>] device_del+0x159/0x1c0
> [<c04559ad>] usb_disable_device+0x4d/0x100
> [<c044fe8a>] usb_disconnect+0x9a/0x110
> [<c0452405>] hub_thread+0x355/0xbd0
> [<c061426e>] schedule+0x2de/0x8f0
> [<c013c640>] autoremove_wake_function+0x0/0x50
> [<c04520b0>] hub_thread+0x0/0xbd0
> [<c013c58c>] kthread+0xec/0xf0
> [<c013c4a0>] kthread+0x0/0xf0
> [<c0103be7>] kernel_thread_helper+0x7/0x10
> =======================

yes, this one looks like memory scribblage in bluetooth. The
buffer.c assertion failure should now be fixed, please verify.

2006-12-24 11:51:23

by Jiri Slaby

[permalink] [raw]
Subject: Re: ptrace() memory corruption?

Jiri Slaby wrote:
> Pavel Machek wrote:
>>> Is there something wrong with gdb?
>> Yep. If I do gdb /bin/bash, run; I'll get similar oops. Am I alone
>> seeing this?
>
> Nope, I have this nasty thing here too and will post oopses in the afternoon,
> just before Jezisek comes :).

Ok, I captured this through netconosle:
[ 8.499155] usb 3-2: new low speed USB device using uhci_hcd and address 2
[ 8.721946] usb 3-2: new device found, idVendor=045e, idProduct=00f0
[ 8.722016] usb 3-2: new device strings: Mfr=1, Product=2, SerialNumber=0
[ 8.722081] usb 3-2: Product: Microsoft � Laser Mouse 6000
[ 8.722145] usb 3-2: Manufacturer: Microsoft Corporation
[ 8.722344] usb 3-2: configuration #1 chosen from 1 choice
[ 8.753100] input: Microsoft Corporation Microsoft � Laser Mouse 6000 as
/class/input/input4
[ 8.753310] input: USB HID v1.11 Mouse [Microsoft Corporation Microsoft �
Laser Mouse 6000] on usb-0000:00:1d.1-2
[ 58.672510] WARNING (!__warned) at /home/l/latest/xxx/kernel/softirq.c:137
local_bh_enable()
[ 58.672562] [<c0103f1b>] show_trace_log_lvl+0x1a/0x30
[ 58.672682] [<c01045d5>] show_trace+0x12/0x14
[ 58.672787] [<c010465c>] dump_stack+0x16/0x18
[ 58.672893] [<c0126ccc>] local_bh_enable+0x8c/0x9b
[ 58.672998] [<c030a499>] lock_sock_nested+0xa3/0xab
[ 58.673107] [<c03080e1>] sock_fasync+0x3e/0x145
[ 58.673216] [<c0309056>] sock_close+0x19/0x3d
[ 58.673322] [<c0165baf>] __fput+0xa6/0x161
[ 58.673432] [<c0165e25>] fput+0x22/0x3b
[ 58.673538] [<c016358a>] filp_close+0x41/0x67
[ 58.673646] [<c01645f3>] sys_close+0x67/0xaf
[ 58.673753] [<c0102fe4>] syscall_call+0x7/0xb
[ 58.673855] =======================
[ 58.674091] ------------[ cut here ]------------
[ 58.674158] kernel BUG at /home/l/latest/xxx/fs/buffer.c:1244!
[ 58.674224] invalid opcode: 0000 [#1]
[ 58.674286] SMP
[ 58.674414] last sysfs file: /devices/platform/i2c-9191/9191-0290/fan3_min
[ 58.674478] Modules linked in: eth1394 floppy ohci1394 ieee1394 ide_cd cdrom
[ 58.674778] CPU: 1
[ 58.674779] EIP: 0060:[<c0181fa0>] Not tainted VLI
[ 58.674780] EFLAGS: 00010046 (2.6.20-rc1-mm1 #207)
[ 58.674971] EIP is at __find_get_block+0x165/0x171
[ 58.675035] eax: 00000092 ebx: f78e6ec0 ecx: 00001000 edx: 00008025
[ 58.675101] esi: 00000001 edi: 00001000 ebp: f76efc6c esp: f76efc34
[ 58.675166] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[ 58.675232] Process bash (pid: 1595, ti=f76ee000 task=c1a40560 task.ti=f76ee000)
[ 58.675297] Stack: c1b31ac0 f7df7e3c 0000003a c1b31bd0 f76efc74 c0181bb4
00000001 00000000
[ 58.675693] f7df7e3c f76efc74 c0181544 f78e6ec0 00000001 00001000
f76efc9c c0181fc2
[ 58.676083] f76efcb4 c0181eab 00008025 c1b31ac0 f74e5d40 f76efcb8
c01aa46a f78e6ec0
[ 58.676476] Call Trace:
[ 58.676594] [<c0103f1b>] show_trace_log_lvl+0x1a/0x30
[ 58.676692] [<c0103fd6>] show_stack_log_lvl+0xa5/0xca
[ 58.676789] [<c01041ce>] show_registers+0x1d3/0x2b8
[ 58.676887] [<c01043d4>] die+0x121/0x243
[ 58.676984] [<c010456c>] do_trap+0x76/0x9c
[ 58.677083] [<c0104dcf>] do_invalid_op+0x97/0xa1
[ 58.677181] [<c038a7e4>] error_code+0x7c/0x84
[ 58.677278] [<c0181fc2>] __getblk+0x16/0x20a
[ 58.677375] [<c019ec64>] __ext3_get_inode_loc+0x139/0x332
[ 58.677476] [<c019ee71>] ext3_get_inode_loc+0x14/0x16
[ 58.677575] [<c019ee93>] ext3_reserve_inode_write+0x20/0x6c
[ 58.677674] [<c019eeff>] ext3_mark_inode_dirty+0x20/0x37
[ 58.677772] [<c01a1cd0>] ext3_dirty_inode+0x6b/0x6d
[ 58.677871] [<c017e7c4>] __mark_inode_dirty+0x2a/0x170
[ 58.677969] [<c0176d3c>] touch_atime+0xb4/0xe8
[ 58.678067] [<c016ce4d>] __link_path_walk+0x91e/0xcb6
[ 58.678164] [<c016d22b>] link_path_walk+0x46/0xc3
[ 58.678262] [<c016d46f>] do_path_lookup+0x86/0x1b0
[ 58.678359] [<c016df00>] __path_lookup_intent_open+0x44/0x7f
[ 58.678457] [<c016dfb3>] path_lookup_open+0x21/0x27
[ 58.678555] [<c016e088>] open_namei+0x62/0x5cb
[ 58.678653] [<c01638d2>] do_filp_open+0x26/0x43
[ 58.678750] [<c0163930>] do_sys_open+0x41/0xca
[ 58.678847] [<c01639f1>] sys_open+0x1c/0x1e
[ 58.678943] [<c0102fe4>] syscall_call+0x7/0xb
[ 58.679040] =======================
[ 58.679101] Code: 45 d0 e8 b6 f5 ff ff e9 22 ff ff ff 89 d8 e8 aa f5 ff ff eb
8c 89 ce 8d 4e ff 8b 04 8f 89 04 b7 85 c9 75 f1 89 1f e9 f6 fe ff ff <0f> 0b eb
fe 0f 0b eb fe 0f 0b eb fe 55 89 e5 57 56 53 83 ec 1c
[ 58.681386] EIP: [<c0181fa0>] __find_get_block+0x165/0x171 SS:ESP 0068:f76efc34
[ 58.681545]

after gdb /bin/bash
(gdb) run

regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-12-24 12:22:49

by Andrew Morton

[permalink] [raw]
Subject: Re: ptrace() memory corruption?

On Sun, 24 Dec 2006 12:51:16 +0059
Jiri Slaby <[email protected]> wrote:

> [ 58.674780] EFLAGS: 00010046 (2.6.20-rc1-mm1 #207)

please, test 2.6.20-rc2. We applied a fix for this.

2006-12-24 13:43:18

by Jiri Slaby

[permalink] [raw]
Subject: Re: ptrace() memory corruption?

Andrew Morton wrote:
> On Sun, 24 Dec 2006 12:51:16 +0059
> Jiri Slaby <[email protected]> wrote:
>
>> [ 58.674780] EFLAGS: 00010046 (2.6.20-rc1-mm1 #207)
>
> please, test 2.6.20-rc2. We applied a fix for this.

It's working now.

thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-12-24 15:03:51

by Marcel Holtmann

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi Pavel,

> > I got this nasty oops while playing with debugger. Not sure if that is
> > related; it also might be something with bluetooth; I already know it
> > corrupts memory during suspend, perhaps it corrupts memory in some
> > error path?
>
> Okay, I spoke too soon. bluetooth & suspend memory corruption was
> _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> cycles... so it is probably unrelated to the previous crash.

can you try to reproduce this with 2.6.20-rc2 as well.

Regards

Marcel


2006-12-24 23:24:33

by Pavel Machek

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi!

> > PM: Removing info for No Bus:usbdev3.15_ep81
> > PM: Removing info for No Bus:usbdev3.15_ep82
> > PM: Removing info for No Bus:usbdev3.15_ep02
> > slab error in verify_redzone_free(): cache `size-512': memory outside object was overwritten
> > [<c016a1b8>] cache_free_debugcheck+0x128/0x1d0
> > [<c04b58e3>] hci_usb_close+0xf3/0x160
> > [<c016b530>] kfree+0x50/0xa0
> > [<c04b58e3>] hci_usb_close+0xf3/0x160
> > [<c04b59be>] hci_usb_disconnect+0x2e/0x90
> > [<c0454f23>] usb_disable_interface+0x53/0x70
> > [<c04576f8>] usb_unbind_interface+0x38/0x80
> > [<c032f908>] __device_release_driver+0x68/0xb0
> > [<c032fc3e>] device_release_driver+0x1e/0x40
> > [<c032f1db>] bus_remove_device+0x8b/0xa0
> > [<c032dbc9>] device_del+0x159/0x1c0
> > [<c04559ad>] usb_disable_device+0x4d/0x100
> > [<c044fe8a>] usb_disconnect+0x9a/0x110
> > [<c0452405>] hub_thread+0x355/0xbd0
> > [<c061426e>] schedule+0x2de/0x8f0
> > [<c013c640>] autoremove_wake_function+0x0/0x50
> > [<c04520b0>] hub_thread+0x0/0xbd0
> > [<c013c58c>] kthread+0xec/0xf0
> > [<c013c4a0>] kthread+0x0/0xf0
> > [<c0103be7>] kernel_thread_helper+0x7/0x10
> > =======================
>
> yes, this one looks like memory scribblage in bluetooth. The
> buffer.c assertion failure should now be fixed, please verify.

I can confirm buffer.c assertion to be fixed (yes, I was using gdb at
that time).
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-12-24 23:38:35

by Pavel Machek

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

On Sun 2006-12-24 15:39:23, Marcel Holtmann wrote:
> Hi Pavel,
>
> > > I got this nasty oops while playing with debugger. Not sure if that is
> > > related; it also might be something with bluetooth; I already know it
> > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > error path?
> >
> > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > cycles... so it is probably unrelated to the previous crash.
>
> can you try to reproduce this with 2.6.20-rc2 as well.

Yep, here it is, reproduced on 6-th-or-so suspend.

bluetooth may need to be actively used in order for this to trigger;
connecting to the net over my cellphone seems to work okay.

(Full logs in attachment).

Pavel

Linux version 2.6.20-rc2 (pavel@amd) (gcc version 4.0.4 20060507
(prerelease) (Debian 4.0.3-3)) #383 SMP Fri Dec 22 11:30:05 CET 2006
...
system 00:00: resuming
pnp 00:01: resuming
system 00:02: resuming
pnp 00:03: resuming
pnp 00:04: resuming
pnp 00:05: resuming
pnp 00:06: resuming
pnp 00:07: resuming
i8042 kbd 00:08: resuming
pnp: Device 00:08 does not support activation.
i8042 aux 00:09: resuming
pnp: Device 00:09 does not support activation.
pnp 00:0a: resuming
pnp 00:0b: resuming
platform bluetooth: resuming
pcspkr pcspkr: resuming
vesafb vesafb.0: resuming
serial8250 serial8250: resuming
usb usb1: resuming
usb usb3: resuming
ata2: SATA link down (SStatus 0 SControl 0)
ata3: SATA link down (SStatus 0 SControl 0)
ata4: SATA link down (SStatus 0 SControl 0)
hub 1-0:1.0: resuming
hub 3-0:1.0: resuming
i8042 i8042: resuming
atkbd serio0: resuming
psmouse serio1: resuming
usb usb4: resuming
usb usb5: resuming
hub 4-0:1.0: resuming
hub 5-0:1.0: resuming
usb usb2: resuming
hub 2-0:1.0: resuming
mmcblk mmc0:cc53: resuming
sd 0:0:0:0: resuming
usb 3-2: resuming
usbdev3.8_ep00: PM: resume from 0, parent 3-2 still 2
usb 3-2:1.0: PM: resume from 2, parent 3-2 still 2
usb 3-2:1.0: resuming
usbdev3.8_ep81: PM: resume from 0, parent 3-2:1.0 still 2
usbdev3.8_ep02: PM: resume from 0, parent 3-2:1.0 still 2
usbdev3.8_ep83: PM: resume from 0, parent 3-2:1.0 still 2
usb 3-1: resuming
usbdev3.9_ep00: PM: resume from 0, parent 3-1 still 2
hci_usb 3-1:1.0: PM: resume from 2, parent 3-1 still 2
hci_usb 3-1:1.0: resuming
hci0: PM: resume from 0, parent 3-1:1.0 still 2
usbdev3.9_ep81: PM: resume from 0, parent 3-1:1.0 still 2
usbdev3.9_ep82: PM: resume from 0, parent 3-1:1.0 still 2
usbdev3.9_ep02: PM: resume from 0, parent 3-1:1.0 still 2
hci_usb 3-1:1.1: PM: resume from 2, parent 3-1 still 2
hci_usb 3-1:1.1: resuming
usbdev3.9_ep83: PM: resume from 0, parent 3-1:1.1 still 2
usbdev3.9_ep03: PM: resume from 0, parent 3-1:1.1 still 2
usb 3-1:1.2: PM: resume from 2, parent 3-1 still 2
usb 3-1:1.2: resuming
usbdev3.9_ep84: PM: resume from 0, parent 3-1:1.2 still 2
usbdev3.9_ep04: PM: resume from 0, parent 3-1:1.2 still 2
usb 3-1:1.3: PM: resume from 2, parent 3-1 still 2
usb 3-1:1.3: resuming
Restarting tasks ... <3>__tx_submit: hci0 tx submit failed urb f765d1bc type 2 err -19
usb 3-1: USB disconnect, address 9
PM: Removing info for No Bus:usbdev3.9_ep81
PM: Removing info for No Bus:usbdev3.9_ep82
PM: Removing info for No Bus:usbdev3.9_ep02
slab error in verify_redzone_free(): cache `size-512': memory outside object was overwritten
[<c016a298>] cache_free_debugcheck+0x128/0x1d0
[<c04b08d3>] hci_usb_close+0xf3/0x160
[<c016b610>] kfree+0x50/0xa0
[<c04b08d3>] hci_usb_close+0xf3/0x160
[<c04b09ae>] hci_usb_disconnect+0x2e/0x90
[<c044fed3>] usb_disable_interface+0x53/0x70
[<c04526a8>] usb_unbind_interface+0x38/0x80
[<c032a8b8>] __device_release_driver+0x68/0xb0
[<c032abee>] device_release_driver+0x1e/0x40
[<c032a18b>] bus_remove_device+0x8b/0xa0
[<c0328b79>] device_del+0x159/0x1c0
[<c045095d>] usb_disable_device+0x4d/0x100
[<c044ae3a>] usb_disconnect+0x9a/0x110
[<c044d3b5>] hub_thread+0x355/0xbd0
[<c060f53e>] schedule+0x2de/0x8f0
[<c013c680>] autoremove_wake_function+0x0/0x50
[<c044d060>] hub_thread+0x0/0xbd0
[<c013c5cc>] kthread+0xec/0xf0
[<c013c4e0>] kthread+0x0/0xf0
[<c0103be7>] kernel_thread_helper+0x7/0x10
=======================
f70a2720: redzone 1:0x5a5a5a5a, redzone 2:0xc0545e9e.
------------[ cut here ]------------
kernel BUG at mm/slab.c:2878!
invalid opcode: 0000 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c016a322>] Not tainted VLI
EFLAGS: 00010012 (2.6.20-rc2 #383)
EIP is at cache_free_debugcheck+0x1b2/0x1d0
eax: f70a271c ebx: f70a20f8 ecx: 00052c00 edx: 0000020c
esi: c20df680 edi: f70a2720 ebp: 5a5a5a5a esp: c2313e30
ds: 007b es: 007b ss: 0068
Process khubd (pid: 304, ti=c2312000 task=c2257030 task.ti=c2312000)
Stack: c06aedf0 f70a2720 5a5a5a5a c0545e9e c04b08d3 f70a20c0 c20df680 c20d9164
f70a2724 00000286 c016b610 f653e8d8 f653e8c4 c2134ba0 0000000c c04b08d3
c2134b5c c2134b8c f62e0a54 c2134ad0 00000001 c2134ad0 f62e0a54 c07dbee0
Call Trace:
[<c0545e9e>] sock_alloc_send_skb+0x16e/0x1c0
[<c04b08d3>] hci_usb_close+0xf3/0x160
[<c016b610>] kfree+0x50/0xa0
[<c04b08d3>] hci_usb_close+0xf3/0x160
[<c04b09ae>] hci_usb_disconnect+0x2e/0x90
[<c044fed3>] usb_disable_interface+0x53/0x70
[<c04526a8>] usb_unbind_interface+0x38/0x80
[<c032a8b8>] __device_release_driver+0x68/0xb0
[<c032abee>] device_release_driver+0x1e/0x40
[<c032a18b>] bus_remove_device+0x8b/0xa0
[<c0328b79>] device_del+0x159/0x1c0
[<c045095d>] usb_disable_device+0x4d/0x100
[<c044ae3a>] usb_disconnect+0x9a/0x110
[<c044d3b5>] hub_thread+0x355/0xbd0
[<c060f53e>] schedule+0x2de/0x8f0
[<c013c680>] autoremove_wake_function+0x0/0x50
[<c044d060>] hub_thread+0x0/0xbd0
[<c013c5cc>] kthread+0xec/0xf0
[<c013c4e0>] kthread+0x0/0xf0
[<c0103be7>] kernel_thread_helper+0x7/0x10
=======================
Code: f0 2c 5a 75 8b b9 05 df 6a c0 89 f2 b8 88 98 61 c0 e8 73 f4 ff ff eb 89 81 fb a5 c2 0f 17 0f 85 6c ff ff ff 90 8d 74 26 00 eb 8e <0f> 0b eb fe 0f 0b eb fe 8d b6 00 00 00 00 0f 0b eb fe 8b 52 0c
EIP: [<c016a322>] cache_free_debugcheck+0x1b2/0x1d0 SS:ESP 0068:c2313e30
<7>PM: Adding info for No Bus:vcs63
PM: Adding info for No Bus:vcsa63
PM: Removing info for No Bus:vcs63
PM: Removing info for No Bus:vcsa63
done.
Enabling non-boot CPUs ...
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
Initializing CPU#1
Calibrating delay using timer specific routine.. 3657.64 BogoMIPS (lpj=18288234)
CPU: After generic identify, caps: bfe9fbff 00100000 00000000 00000000 0000c1a9 00000000 00000000
monitor/mwait feature present.
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU: After all inits, caps: bfe9fbff 00100000 00000000 00002940 0000c1a9 00000000 00000000
CPU1: Intel Genuine Intel(R) CPU T2400 @ 1.83GHz stepping 08
PM: Adding info for No Bus:msr1
CPU1 is up
ata1: waiting for device to spin up (8 secs)
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
e1000: eth0: e1000_watchdog: 10/100 speed: disabling TSO
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/100
SCSI device sda: 117210240 512-byte hdwr sectors (60012 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (7.23 kB)
oops4.bz2 (15.12 kB)
Download all attachments

2006-12-24 23:44:11

by Pavel Machek

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi!

> > > I got this nasty oops while playing with debugger. Not sure if that is
> > > related; it also might be something with bluetooth; I already know it
> > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > error path?
> >
> > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > cycles... so it is probably unrelated to the previous crash.
>
> can you try to reproduce this with 2.6.20-rc2 as well.

(reproduced in another mail).

_urb_queue_tail(__pending_q(husb, _urb->type), _urb);
err = usb_submit_urb(urb, GFP_ATOMIC);
if (err) {
BT_ERR("%s tx submit failed urb %p type %d err %d",
husb->hdev->name, urb, _urb->type, err);
_urb_unlink(_urb);

~~~~~~~~~~~~~~~~~~
Do we need to remove urb from pending_q here?

_urb_queue_tail(__completed_q(husb, _urb->type), _urb);
} else
atomic_inc(__pending_tx(husb, _urb->type));

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-12-28 08:43:07

by Pavel Machek

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi!

> > > > I got this nasty oops while playing with debugger. Not sure if that is
> > > > related; it also might be something with bluetooth; I already know it
> > > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > > error path?
> > >
> > > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > > cycles... so it is probably unrelated to the previous crash.
> >
> > can you try to reproduce this with 2.6.20-rc2 as well.
>
> (reproduced in another mail).
>
> _urb_queue_tail(__pending_q(husb, _urb->type), _urb);
> err = usb_submit_urb(urb, GFP_ATOMIC);
> if (err) {
> BT_ERR("%s tx submit failed urb %p type %d err %d",
> husb->hdev->name, urb, _urb->type, err);
> _urb_unlink(_urb);
>
> ~~~~~~~~~~~~~~~~~~
> Do we need to remove urb from pending_q here?
>
> _urb_queue_tail(__completed_q(husb, _urb->type), _urb);
> } else
> atomic_inc(__pending_tx(husb, _urb->type));
>

Any news? Should I convert above idea to a patch? Or should I make
bluetooth suspend() routine return error so corruption is impossible
to hit?
Pavel
--
Thanks for all the (sleeping) penguins.

2006-12-28 10:44:08

by Marcel Holtmann

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi Pavel,

> > > > > I got this nasty oops while playing with debugger. Not sure if that is
> > > > > related; it also might be something with bluetooth; I already know it
> > > > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > > > error path?
> > > >
> > > > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > > > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > > > cycles... so it is probably unrelated to the previous crash.
> > >
> > > can you try to reproduce this with 2.6.20-rc2 as well.
> >
> > (reproduced in another mail).
> >
> > _urb_queue_tail(__pending_q(husb, _urb->type), _urb);
> > err = usb_submit_urb(urb, GFP_ATOMIC);
> > if (err) {
> > BT_ERR("%s tx submit failed urb %p type %d err %d",
> > husb->hdev->name, urb, _urb->type, err);
> > _urb_unlink(_urb);
> >
> > ~~~~~~~~~~~~~~~~~~
> > Do we need to remove urb from pending_q here?
> >
> > _urb_queue_tail(__completed_q(husb, _urb->type), _urb);
> > } else
> > atomic_inc(__pending_tx(husb, _urb->type));
> >
>
> Any news? Should I convert above idea to a patch? Or should I make
> bluetooth suspend() routine return error so corruption is impossible
> to hit?

to be honest, I have no idea. This code is way to ugly anyway.

Regards

Marcel


2006-12-30 21:52:48

by Adrian Bunk

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

On Mon, Dec 25, 2006 at 12:36:47AM +0100, Pavel Machek wrote:
> On Sun 2006-12-24 15:39:23, Marcel Holtmann wrote:
> > Hi Pavel,
> >
> > > > I got this nasty oops while playing with debugger. Not sure if that is
> > > > related; it also might be something with bluetooth; I already know it
> > > > corrupts memory during suspend, perhaps it corrupts memory in some
> > > > error path?
> > >
> > > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > > cycles... so it is probably unrelated to the previous crash.
> >
> > can you try to reproduce this with 2.6.20-rc2 as well.
>
> Yep, here it is, reproduced on 6-th-or-so suspend.
>
> bluetooth may need to be actively used in order for this to trigger;
> connecting to the net over my cellphone seems to work okay.
>
> (Full logs in attachment).

Is this issue also present in 2.6.19 or is it a regression?

> Pavel

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2007-01-01 19:01:23

by Pavel Machek

[permalink] [raw]
Subject: Re: bluetooth memory corruption (was Re: ext3-related crash in 2.6.20-rc1)

Hi!

> > > > Okay, I spoke too soon. bluetooth & suspend memory corruption was
> > > > _way_ harder to reproduce than expected. Took me 5-or-so-suspend
> > > > cycles... so it is probably unrelated to the previous crash.
> > >
> > > can you try to reproduce this with 2.6.20-rc2 as well.
> >
> > Yep, here it is, reproduced on 6-th-or-so suspend.
> >
> > bluetooth may need to be actively used in order for this to trigger;
> > connecting to the net over my cellphone seems to work okay.
> >
> > (Full logs in attachment).
>
> Is this issue also present in 2.6.19 or is it a regression?

Not sure... but I know there were some bluetooth & suspend problems
before.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html