2007-12-28 17:32:03

by Gabor Gombas

[permalink] [raw]
Subject: [Bluez-devel] Oops involving RFCOMM and sysfs

Hi,

I'm using my phone as a GPRS modem over Bluetooth. Sometimes Bluetooth
on the phone seems to stall and the phone has to be switched off & on to
get it back to a sane state. During this I sometimes get the following
Oops (this is kernel 2.6.24-rc6 on x86_64):

Dec 28 17:52:11 host kernel: Unable to handle kernel NULL pointer dereference at 00000000000000b8 RIP:
Dec 28 17:52:11 host kernel: [<ffffffff8046e0d5>] mutex_lock+0x10/0x1d
Dec 28 17:52:11 host kernel: PGD 920b0067 PUD a7aba067 PMD 0
Dec 28 17:52:11 host kernel: Oops: 0002 [1]
Dec 28 17:52:11 host kernel: CPU 0
Dec 28 17:52:11 host kernel: Modules linked in: cpufreq_ondemand ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async crc_ccitt ppp_generic slhc binfmt_misc rfcomm l2cap nfsd auth_rpcgss exportfs ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt dm_snapshot dm_mirror saa7134_alsa radeon hwmon_vid eeprom hci_usb bluetooth usb_storage tuner tea5767 tda8290 tuner_simple mt20xx tea5761 sg snd_intel8x0 saa7134 firewire_ohci snd_ac97_codec firewire_core videobuf_dma_sg videobuf_core ir_kbd_i2c ac97_bus crc_itu_t ir_common snd_pcm ehci_hcd ohci_hcd parport_pc parport sky2 sr_mod forcedeth snd_timer snd_page_alloc cdrom floppy
Dec 28 17:52:11 host kernel: Pid: 4381, comm: pppd Not tainted 2.6.24-rc6 #1
Dec 28 17:52:11 host kernel: RIP: 0010:[<ffffffff8046e0d5>] [<ffffffff8046e0d5>] mutex_lock+0x10/0x1d
Dec 28 17:52:11 host kernel: RSP: 0018:ffff81009402fd08 EFLAGS: 00010246
Dec 28 17:52:11 host kernel: RAX: 0000000000000000 RBX: 00000000000000b8 RCX: 0000000000000000
Dec 28 17:52:11 host kernel: RDX: ffff81009402ffd8 RSI: ffffffff80639d70 RDI: 00000000000000b8
Dec 28 17:52:11 host kernel: RBP: ffffffff8058adb0 R08: 6390dfaa7e3bd395 R09: ffffffff8050597e
Dec 28 17:52:11 host kernel: R10: 0000000000004000 R11: 0000003000000018 R12: ffff8100b4d75e60
Dec 28 17:52:11 host kernel: R13: 00000000fffffff4 R14: ffff8100abe9e180 R15: ffff8100a7a10900
Dec 28 17:52:11 host kernel: FS: 00002ab1bddc3ac0(0000) GS:ffffffff805b0000(0000) knlGS:0000000000000000
Dec 28 17:52:11 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 28 17:52:11 host kernel: CR2: 00000000000000b8 CR3: 000000009400e000 CR4: 00000000000006e0
Dec 28 17:52:11 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 28 17:52:11 host kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 28 17:52:11 host kernel: Process pppd (pid: 4381, threadinfo ffff81009402e000, task ffff81009403d060)
Dec 28 17:52:11 host kernel: Stack: ffff81009403d060 ffffffff80285555 ffff810093734d48 ffffffff802aefa5
Dec 28 17:52:11 host kernel: ffff8100bf9e1c80 ffff8100b4d75e60 ffff810060720600 ffffffff802af1ba
Dec 28 17:52:11 host kernel: ffff8100b6e5ced0 ffff8100b6e5ced0 ffff8100bfa11c00 ffff810060720600
Dec 28 17:52:11 host kernel: Call Trace:
Dec 28 17:52:11 host kernel: [<ffffffff80285555>] dput+0x26/0x103
Dec 28 17:52:11 host kernel: [<ffffffff802aefa5>] sysfs_get_dentry+0x45/0x8f
Dec 28 17:52:11 host kernel: [<ffffffff802af1ba>] sysfs_move_dir+0x63/0x204
Dec 28 17:52:11 host kernel: [<ffffffff803006e5>] kobject_move+0xba/0x110
Dec 28 17:52:11 host kernel: [<ffffffff80368640>] device_move+0x59/0x111
Dec 28 17:52:11 host kernel: [<ffffffff8828841c>] :rfcomm:rfcomm_tty_close+0x2f/0x74
Dec 28 17:52:11 host kernel: [<ffffffff803442ff>] release_dev+0x212/0x5e2
Dec 28 17:52:11 host kernel: [<ffffffff80288801>] mntput_no_expire+0x1f/0x6f
Dec 28 17:52:11 host kernel: [<ffffffff803e845d>] sys_sendto+0x128/0x151
Dec 28 17:52:11 host kernel: [<ffffffff803446db>] tty_release+0xc/0x10
Dec 28 17:52:11 host kernel: [<ffffffff80276f67>] __fput+0xb1/0x16f
Dec 28 17:52:11 host kernel: [<ffffffff802748b5>] filp_close+0x5d/0x65
Dec 28 17:52:11 host kernel: [<ffffffff80275a07>] sys_close+0x73/0xa6
Dec 28 17:52:11 host kernel: [<ffffffff8020b5fc>] tracesys+0xdc/0xe1
Dec 28 17:52:11 host kernel:
Dec 28 17:52:11 host kernel:
Dec 28 17:52:11 host kernel: Code: ff 0f 79 05 e8 c9 00 00 00 58 5a 5b c3 41 54 48 8d 47 08 48
Dec 28 17:52:11 host kernel: RIP [<ffffffff8046e0d5>] mutex_lock+0x10/0x1d
Dec 28 17:52:11 host kernel: RSP <ffff81009402fd08>
Dec 28 17:52:11 host kernel: CR2: 00000000000000b8
Dec 28 17:52:11 host kernel: ---[ end trace 12717319afdb56f5 ]---

The bug is also present in 2.6.23.9:

Dec 28 15:30:29 twister kernel: Unable to handle kernel paging request at fffffffffffffffe RIP:
Dec 28 15:30:29 twister kernel: [<ffffffff8027ac13>] dput+0xd/0x103
Dec 28 15:30:29 twister kernel: PGD 203067 PUD 204067 PMD 0
Dec 28 15:30:29 twister kernel: Oops: 0000 [1]
Dec 28 15:30:29 twister kernel: CPU 0
Dec 28 15:30:29 twister kernel: Modules linked in: ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async crc_ccitt ppp_generic slhc binfmt_misc rfcomm l2cap nfsd exportfs auth_rpcgss ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt dm_snapshot dm_mirror saa7134_alsa radeon it87 hwmon_vid eeprom hci_usb bluetooth tuner usb_storage saa7134 video_buf ir_kbd_i2c snd_intel8x0 sg firewire_ohci firewire_core ir_common crc_itu_t snd_ac97_codec ac97_bus forcedeth parport_pc parport sky2 ohci_hcd snd_pcm snd_timer snd_page_alloc ehci_hcd sr_mod cdrom floppy
Dec 28 15:30:29 twister kernel: Pid: 14700, comm: pppd Not tainted 2.6.23.9 #1
Dec 28 15:30:29 twister kernel: RIP: 0010:[<ffffffff8027ac13>] [<ffffffff8027ac13>] dput+0xd/0x103
Dec 28 15:30:29 twister kernel: RSP: 0018:ffff810054e5bd28 EFLAGS: 00010282
Dec 28 15:30:29 twister kernel: RAX: fffffffffffffffe RBX: fffffffffffffffe RCX: 0000000000000052
Dec 28 15:30:29 twister kernel: RDX: ffffffff802f6414 RSI: ffff81000411b4e0 RDI: fffffffffffffffe
Dec 28 15:30:29 twister kernel: RBP: ffff8100bfef14d0 R08: ffffffffffffffff R09: 0000000000005401
Dec 28 15:30:29 twister kernel: R10: 00007fff7018a2e0 R11: 0000000000000246 R12: ffff8100b3731948
Dec 28 15:30:29 twister kernel: R13: 0000000000000000 R14: ffff81005a1cb100 R15: fffffffffffffffe
Dec 28 15:30:29 twister kernel: FS: 00002acd3a94cac0(0000) GS:ffffffff80592000(0000) knlGS:00000000f7d786b0
Dec 28 15:30:29 twister kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 28 15:30:29 twister kernel: CR2: fffffffffffffffe CR3: 0000000047f21000 CR4: 00000000000006e0
Dec 28 15:30:29 twister kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 28 15:30:29 twister kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 28 15:30:29 twister kernel: Process pppd (pid: 14700, threadinfo ffff810054e5a000, task ffff81007907ce40)
Dec 28 15:30:29 twister kernel: Stack: 00000000fffffffe ffffffff802a5a81 0000000000000053 ffff81000411b4e0
Dec 28 15:30:29 twister kernel: 0000000000000000 ffff8100b539f8d0 ffff81000411b4e0 ffff8100b04e7060
Dec 28 15:30:29 twister kernel: 00000000fffffff4 ffff81005a1cb100 ffff8100b09db800 ffffffff802f6448
Dec 28 15:30:29 twister kernel: Call Trace:
Dec 28 15:30:29 twister kernel: [<ffffffff802a5a81>] sysfs_move_dir+0x1d2/0x1ed
Dec 28 15:30:29 twister kernel: [<ffffffff802f6448>] kobject_move+0xba/0x110
Dec 28 15:30:29 twister kernel: [<ffffffff8035e0f0>] device_move+0x59/0x111
Dec 28 15:30:29 twister kernel: [<ffffffff8827c382>] :rfcomm:rfcomm_tty_close+0x2f/0x74
Dec 28 15:30:29 twister kernel: [<ffffffff803397ff>] release_dev+0x212/0x5e2
Dec 28 15:30:29 twister kernel: [<ffffffff802d4206>] inode_has_perm+0x65/0x72
Dec 28 15:30:29 twister kernel: [<ffffffff802d42a7>] file_has_perm+0x94/0xa3
Dec 28 15:30:29 twister kernel: [<ffffffff80339bdb>] tty_release+0xc/0x11
Dec 28 15:30:29 twister kernel: [<ffffffff8026c578>] __fput+0xb1/0x177
Dec 28 15:30:29 twister kernel: [<ffffffff80269ecc>] filp_close+0x5d/0x65
Dec 28 15:30:29 twister kernel: [<ffffffff8026b01a>] sys_close+0x72/0xa5
Dec 28 15:30:29 twister kernel: [<ffffffff8020b5cc>] tracesys+0xdc/0xe1
Dec 28 15:30:29 twister kernel:
Dec 28 15:30:29 twister kernel:
Dec 28 15:30:29 twister kernel: Code: 83 3b 01 75 05 e8 e8 29 1e 00 48 c7 c6 08 de 61 80 48 89 df
Dec 28 15:30:29 twister kernel: RIP [<ffffffff8027ac13>] dput+0xd/0x103
Dec 28 15:30:29 twister kernel: RSP <ffff810054e5bd28>
Dec 28 15:30:29 twister kernel: CR2: fffffffffffffffe

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel


2007-12-29 08:07:04

by Dave Young

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Dec 29, 2007 1:32 AM, Gabor Gombas <[email protected]> wrote:
> Hi,
>
> I'm using my phone as a GPRS modem over Bluetooth. Sometimes Bluetooth
> on the phone seems to stall and the phone has to be switched off & on to
> get it back to a sane state. During this I sometimes get the following
> Oops (this is kernel 2.6.24-rc6 on x86_64):
>
> Dec 28 17:52:11 host kernel: Unable to handle kernel NULL pointer dereference at 00000000000000b8 RIP:
> Dec 28 17:52:11 host kernel: [<ffffffff8046e0d5>] mutex_lock+0x10/0x1d
> Dec 28 17:52:11 host kernel: PGD 920b0067 PUD a7aba067 PMD 0
> Dec 28 17:52:11 host kernel: Oops: 0002 [1]
> Dec 28 17:52:11 host kernel: CPU 0
> Dec 28 17:52:11 host kernel: Modules linked in: cpufreq_ondemand ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async crc_ccitt ppp_generic slhc binfmt_misc rfcomm l2cap nfsd auth_rpcgss exportfs ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt dm_snapshot dm_mirror saa7134_alsa radeon hwmon_vid eeprom hci_usb bluetooth usb_storage tuner tea5767 tda8290 tuner_simple mt20xx tea5761 sg snd_intel8x0 saa7134 firewire_ohci snd_ac97_codec firewire_core videobuf_dma_sg videobuf_core ir_kbd_i2c ac97_bus crc_itu_t ir_common snd_pcm ehci_hcd ohci_hcd parport_pc parport sky2 sr_mod forcedeth snd_timer snd_page_alloc cdrom floppy
> Dec 28 17:52:11 host kernel: Pid: 4381, comm: pppd Not tainted 2.6.24-rc6 #1
> Dec 28 17:52:11 host kernel: RIP: 0010:[<ffffffff8046e0d5>] [<ffffffff8046e0d5>] mutex_lock+0x10/0x1d
> Dec 28 17:52:11 host kernel: RSP: 0018:ffff81009402fd08 EFLAGS: 00010246
> Dec 28 17:52:11 host kernel: RAX: 0000000000000000 RBX: 00000000000000b8 RCX: 0000000000000000
> Dec 28 17:52:11 host kernel: RDX: ffff81009402ffd8 RSI: ffffffff80639d70 RDI: 00000000000000b8
> Dec 28 17:52:11 host kernel: RBP: ffffffff8058adb0 R08: 6390dfaa7e3bd395 R09: ffffffff8050597e
> Dec 28 17:52:11 host kernel: R10: 0000000000004000 R11: 0000003000000018 R12: ffff8100b4d75e60
> Dec 28 17:52:11 host kernel: R13: 00000000fffffff4 R14: ffff8100abe9e180 R15: ffff8100a7a10900
> Dec 28 17:52:11 host kernel: FS: 00002ab1bddc3ac0(0000) GS:ffffffff805b0000(0000) knlGS:0000000000000000
> Dec 28 17:52:11 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Dec 28 17:52:11 host kernel: CR2: 00000000000000b8 CR3: 000000009400e000 CR4: 00000000000006e0
> Dec 28 17:52:11 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 28 17:52:11 host kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 28 17:52:11 host kernel: Process pppd (pid: 4381, threadinfo ffff81009402e000, task ffff81009403d060)
> Dec 28 17:52:11 host kernel: Stack: ffff81009403d060 ffffffff80285555 ffff810093734d48 ffffffff802aefa5
> Dec 28 17:52:11 host kernel: ffff8100bf9e1c80 ffff8100b4d75e60 ffff810060720600 ffffffff802af1ba
> Dec 28 17:52:11 host kernel: ffff8100b6e5ced0 ffff8100b6e5ced0 ffff8100bfa11c00 ffff810060720600
> Dec 28 17:52:11 host kernel: Call Trace:
> Dec 28 17:52:11 host kernel: [<ffffffff80285555>] dput+0x26/0x103
> Dec 28 17:52:11 host kernel: [<ffffffff802aefa5>] sysfs_get_dentry+0x45/0x8f
> Dec 28 17:52:11 host kernel: [<ffffffff802af1ba>] sysfs_move_dir+0x63/0x204
> Dec 28 17:52:11 host kernel: [<ffffffff803006e5>] kobject_move+0xba/0x110
> Dec 28 17:52:11 host kernel: [<ffffffff80368640>] device_move+0x59/0x111
> Dec 28 17:52:11 host kernel: [<ffffffff8828841c>] :rfcomm:rfcomm_tty_close+0x2f/0x74
> Dec 28 17:52:11 host kernel: [<ffffffff803442ff>] release_dev+0x212/0x5e2
> Dec 28 17:52:11 host kernel: [<ffffffff80288801>] mntput_no_expire+0x1f/0x6f
> Dec 28 17:52:11 host kernel: [<ffffffff803e845d>] sys_sendto+0x128/0x151
> Dec 28 17:52:11 host kernel: [<ffffffff803446db>] tty_release+0xc/0x10
> Dec 28 17:52:11 host kernel: [<ffffffff80276f67>] __fput+0xb1/0x16f
> Dec 28 17:52:11 host kernel: [<ffffffff802748b5>] filp_close+0x5d/0x65
> Dec 28 17:52:11 host kernel: [<ffffffff80275a07>] sys_close+0x73/0xa6
> Dec 28 17:52:11 host kernel: [<ffffffff8020b5fc>] tracesys+0xdc/0xe1
> Dec 28 17:52:11 host kernel:
> Dec 28 17:52:11 host kernel:
> Dec 28 17:52:11 host kernel: Code: ff 0f 79 05 e8 c9 00 00 00 58 5a 5b c3 41 54 48 8d 47 08 48
> Dec 28 17:52:11 host kernel: RIP [<ffffffff8046e0d5>] mutex_lock+0x10/0x1d
> Dec 28 17:52:11 host kernel: RSP <ffff81009402fd08>
> Dec 28 17:52:11 host kernel: CR2: 00000000000000b8
> Dec 28 17:52:11 host kernel: ---[ end trace 12717319afdb56f5 ]---
>
> The bug is also present in 2.6.23.9:
>
> Dec 28 15:30:29 twister kernel: Unable to handle kernel paging request at fffffffffffffffe RIP:
> Dec 28 15:30:29 twister kernel: [<ffffffff8027ac13>] dput+0xd/0x103
> Dec 28 15:30:29 twister kernel: PGD 203067 PUD 204067 PMD 0
> Dec 28 15:30:29 twister kernel: Oops: 0000 [1]
> Dec 28 15:30:29 twister kernel: CPU 0
> Dec 28 15:30:29 twister kernel: Modules linked in: ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async crc_ccitt ppp_generic slhc binfmt_misc rfcomm l2cap nfsd exportfs auth_rpcgss ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt dm_snapshot dm_mirror saa7134_alsa radeon it87 hwmon_vid eeprom hci_usb bluetooth tuner usb_storage saa7134 video_buf ir_kbd_i2c snd_intel8x0 sg firewire_ohci firewire_core ir_common crc_itu_t snd_ac97_codec ac97_bus forcedeth parport_pc parport sky2 ohci_hcd snd_pcm snd_timer snd_page_alloc ehci_hcd sr_mod cdrom floppy
> Dec 28 15:30:29 twister kernel: Pid: 14700, comm: pppd Not tainted 2.6.23.9 #1
> Dec 28 15:30:29 twister kernel: RIP: 0010:[<ffffffff8027ac13>] [<ffffffff8027ac13>] dput+0xd/0x103
> Dec 28 15:30:29 twister kernel: RSP: 0018:ffff810054e5bd28 EFLAGS: 00010282
> Dec 28 15:30:29 twister kernel: RAX: fffffffffffffffe RBX: fffffffffffffffe RCX: 0000000000000052
> Dec 28 15:30:29 twister kernel: RDX: ffffffff802f6414 RSI: ffff81000411b4e0 RDI: fffffffffffffffe
> Dec 28 15:30:29 twister kernel: RBP: ffff8100bfef14d0 R08: ffffffffffffffff R09: 0000000000005401
> Dec 28 15:30:29 twister kernel: R10: 00007fff7018a2e0 R11: 0000000000000246 R12: ffff8100b3731948
> Dec 28 15:30:29 twister kernel: R13: 0000000000000000 R14: ffff81005a1cb100 R15: fffffffffffffffe
> Dec 28 15:30:29 twister kernel: FS: 00002acd3a94cac0(0000) GS:ffffffff80592000(0000) knlGS:00000000f7d786b0
> Dec 28 15:30:29 twister kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Dec 28 15:30:29 twister kernel: CR2: fffffffffffffffe CR3: 0000000047f21000 CR4: 00000000000006e0
> Dec 28 15:30:29 twister kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec 28 15:30:29 twister kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec 28 15:30:29 twister kernel: Process pppd (pid: 14700, threadinfo ffff810054e5a000, task ffff81007907ce40)
> Dec 28 15:30:29 twister kernel: Stack: 00000000fffffffe ffffffff802a5a81 0000000000000053 ffff81000411b4e0
> Dec 28 15:30:29 twister kernel: 0000000000000000 ffff8100b539f8d0 ffff81000411b4e0 ffff8100b04e7060
> Dec 28 15:30:29 twister kernel: 00000000fffffff4 ffff81005a1cb100 ffff8100b09db800 ffffffff802f6448
> Dec 28 15:30:29 twister kernel: Call Trace:
> Dec 28 15:30:29 twister kernel: [<ffffffff802a5a81>] sysfs_move_dir+0x1d2/0x1ed
> Dec 28 15:30:29 twister kernel: [<ffffffff802f6448>] kobject_move+0xba/0x110
> Dec 28 15:30:29 twister kernel: [<ffffffff8035e0f0>] device_move+0x59/0x111
> Dec 28 15:30:29 twister kernel: [<ffffffff8827c382>] :rfcomm:rfcomm_tty_close+0x2f/0x74
> Dec 28 15:30:29 twister kernel: [<ffffffff803397ff>] release_dev+0x212/0x5e2
> Dec 28 15:30:29 twister kernel: [<ffffffff802d4206>] inode_has_perm+0x65/0x72
> Dec 28 15:30:29 twister kernel: [<ffffffff802d42a7>] file_has_perm+0x94/0xa3
> Dec 28 15:30:29 twister kernel: [<ffffffff80339bdb>] tty_release+0xc/0x11
> Dec 28 15:30:29 twister kernel: [<ffffffff8026c578>] __fput+0xb1/0x177
> Dec 28 15:30:29 twister kernel: [<ffffffff80269ecc>] filp_close+0x5d/0x65
> Dec 28 15:30:29 twister kernel: [<ffffffff8026b01a>] sys_close+0x72/0xa5
> Dec 28 15:30:29 twister kernel: [<ffffffff8020b5cc>] tracesys+0xdc/0xe1
> Dec 28 15:30:29 twister kernel:
> Dec 28 15:30:29 twister kernel:
> Dec 28 15:30:29 twister kernel: Code: 83 3b 01 75 05 e8 e8 29 1e 00 48 c7 c6 08 de 61 80 48 89 df
> Dec 28 15:30:29 twister kernel: RIP [<ffffffff8027ac13>] dput+0xd/0x103
> Dec 28 15:30:29 twister kernel: RSP <ffff810054e5bd28>
> Dec 28 15:30:29 twister kernel: CR2: fffffffffffffffe
>

Please try the -mm tree kernel, might have been fixed by :
http://lkml.org/lkml/2007/11/18/141

Regards
dave

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-11 23:09:29

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Thu, Jan 10, 2008 at 09:11:17AM +0800, Dave Young wrote:

> For bluetooth device_move, the only child device of hci_conn dev is
> the rfcomm tty_dev. How about the following patch, please verify :

There is now no oops, instead the keyboard becomes almost completely
unresponsible when I switch off & on the phone. The mouse still works
(tested both with X and with the VGA console), but terminal input and VT
switching is dead. On the VGA console, even scrolling using Shift+PgUp
stops working.

Alt+SysRq+w works, trace is below. Ctrl+Alt+Del also works, but then
init complains about hung processes and that it can not umount the file
systems and cannot stop the RAID arrays. Once it still rebooted though,
the second time it got hung after trying to umount the filesystems, and
I had to use Alt+SysRq+b.

If I can choose then I prefer the Oops...

Jan 11 23:46:35 twister kernel: SysRq : Emergency Sync
Jan 11 23:46:35 twister kernel: Emergency Sync complete
Jan 11 23:46:42 twister kernel: SysRq : Show Blocked State
Jan 11 23:46:42 twister kernel: task PC stack pid father
Jan 11 23:46:42 twister kernel: events/0 D ffffffff80487190 0 5 2
Jan 11 23:46:42 twister kernel: ffff8100bf84fd40 0000000000000046 ffff8100ae7f2600 ffff8100bf84fd00
Jan 11 23:46:42 twister kernel: ffff8100bf84a830 ffff8100bd872000 ffffffff80652ae0 0000000100000100
Jan 11 23:46:42 twister kernel: 0000000000000000 7fffffffffffffff 7fffffffffffffff 0000000000000002
Jan 11 23:46:42 twister kernel: Call Trace:
Jan 11 23:46:42 twister kernel: [<ffffffff80473543>] schedule_timeout+0x1e/0xad
Jan 11 23:46:42 twister kernel: [<ffffffff8028560d>] dput+0x26/0x103
Jan 11 23:46:42 twister kernel: [<ffffffff802af3fd>] sysfs_move_dir+0x1ee/0x1fd
Jan 11 23:46:42 twister kernel: [<ffffffff8047340b>] wait_for_common+0xc4/0x129
Jan 11 23:46:42 twister kernel: [<ffffffff80224204>] default_wake_function+0x0/0xe
Jan 11 23:46:42 twister kernel: [<ffffffff80471d72>] klist_del+0x15/0x2e
Jan 11 23:46:42 twister kernel: [<ffffffff803698f3>] device_move+0x80/0x111
Jan 11 23:46:42 twister kernel: [<ffffffff88146a27>] :bluetooth:hci_conn_move_child+0x0/0xf
Jan 11 23:46:42 twister kernel: [<ffffffff88146a32>] :bluetooth:hci_conn_move_child+0xb/0xf
Jan 11 23:46:42 twister kernel: [<ffffffff80369836>] device_for_each_child+0x22/0x4d
Jan 11 23:46:42 twister kernel: [<ffffffff88146a05>] :bluetooth:del_conn+0x0/0x22
Jan 11 23:46:42 twister kernel: [<ffffffff88146a1e>] :bluetooth:del_conn+0x19/0x22
Jan 11 23:46:42 twister kernel: [<ffffffff8023696d>] run_workqueue+0x74/0xee
Jan 11 23:46:42 twister kernel: [<ffffffff80236ff8>] worker_thread+0x0/0xe7
Jan 11 23:46:42 twister kernel: [<ffffffff802370d2>] worker_thread+0xda/0xe7
Jan 11 23:46:42 twister kernel: [<ffffffff80239b77>] autoremove_wake_function+0x0/0x2e
Jan 11 23:46:42 twister kernel: [<ffffffff80236ff8>] worker_thread+0x0/0xe7
Jan 11 23:46:42 twister kernel: [<ffffffff802399a0>] kthread+0x47/0x73
Jan 11 23:46:42 twister kernel: [<ffffffff8020bdc8>] child_rip+0xa/0x12
Jan 11 23:46:42 twister kernel: [<ffffffff80239959>] kthread+0x0/0x73
Jan 11 23:46:42 twister kernel: [<ffffffff8020bdbe>] child_rip+0x0/0x12
Jan 11 23:46:42 twister kernel:
Jan 11 23:46:42 twister kernel: cat D 7fffffffffffffff 0 3978 3965
Jan 11 23:46:42 twister kernel: ffff81009e857ce8 0000000000000086 ffff8100ae7f2900 ffff81009e857ca8
Jan 11 23:46:42 twister kernel: ffff8100bd872000 ffff8100bf9e7060 ffffffff80652ae0 00000001000000c0
Jan 11 23:46:42 twister kernel: 0000000000000000 7fffffffffffffff 7fffffffffffffff 0000000000000002
Jan 11 23:46:42 twister kernel: Call Trace:
Jan 11 23:46:42 twister kernel: [<ffffffff80473543>] schedule_timeout+0x1e/0xad
Jan 11 23:46:42 twister kernel: [<ffffffff80223ccc>] __dequeue_entity+0x1c/0x32
Jan 11 23:46:42 twister kernel: [<ffffffff80223d05>] set_next_entity+0x23/0x73
Jan 11 23:46:42 twister kernel: [<ffffffff8047340b>] wait_for_common+0xc4/0x129
Jan 11 23:46:42 twister kernel: [<ffffffff80224204>] default_wake_function+0x0/0xe
Jan 11 23:46:42 twister kernel: [<ffffffff80236b02>] flush_cpu_workqueue+0x50/0x58
Jan 11 23:46:42 twister kernel: [<ffffffff80236c32>] wq_barrier_func+0x0/0x9
Jan 11 23:46:42 twister kernel: [<ffffffff80236ecf>] flush_workqueue+0x9/0x12
Jan 11 23:46:42 twister kernel: [<ffffffff803449f9>] release_dev+0x47c/0x5e2
Jan 11 23:46:42 twister kernel: [<ffffffff8021b609>] do_page_fault+0x2ff/0x65a
Jan 11 23:46:42 twister kernel: [<ffffffff80344b6b>] tty_release+0xc/0x10
Jan 11 23:46:42 twister kernel: [<ffffffff80277013>] __fput+0xb1/0x16f
Jan 11 23:46:42 twister kernel: [<ffffffff80274961>] filp_close+0x5d/0x65
Jan 11 23:46:42 twister kernel: [<ffffffff80275ab3>] sys_close+0x73/0xa6
Jan 11 23:46:42 twister kernel: [<ffffffff8020b5fc>] tracesys+0xdc/0xe1
Jan 11 23:46:42 twister kernel:

If I'm not mistaken the "cat" above is the "cat /dev/zero >
/dev/rfcomm0" command.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-10 10:15:31

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hi,

On Wed, Jan 09, 2008 at 06:16:02PM +0900, Tejun Heo wrote:

> Please confirm the attached patch fixes the oops. I'll separate it into
> two patches and forward them to Greg. But bluetooth code also needs to
> be updated such that it moves the refcommX device before killing the
> connection node.

The kernel survives switching the phone off & on but then oopses when
pppd is started:

Jan 10 10:59:09 twister kernel: Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
Jan 10 10:59:09 twister kernel: [<ffffffff802af03f>] sysfs_get_dentry+0x27/0x8f
Jan 10 10:59:09 twister kernel: PGD 260bb067 PUD 260bc067 PMD 0
Jan 10 10:59:09 twister kernel: Oops: 0000 [1]
Jan 10 10:59:09 twister kernel: CPU 0
Jan 10 10:59:09 twister kernel: Modules linked in: ppp_generic slhc binfmt_misc rfcomm l2cap nfsd auth_rpcgss exportfs ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt saa7134_alsa radeon hwmon_vid eeprom hci_usb bluetooth usb_storage tuner tea5767 tda8290 tuner_simple mt20xx tea5761 snd_intel8x0 snd_ac97_codec saa7134 firewire_ohci firewire_core sg ac97_bus videobuf_dma_sg videobuf_core ir_kbd_i2c crc_itu_t ir_common snd_pcm parport_pc parport sky2 forcedeth sr_mod snd_timer snd_page_alloc ehci_hcd ohci_hcd floppy cdrom
Jan 10 10:59:09 twister kernel: Pid: 4490, comm: pppd Not tainted 2.6.24-rc6debug-dirty #8
Jan 10 10:59:09 twister kernel: RIP: 0010:[<ffffffff802af03f>] [<ffffffff802af03f>] sysfs_get_dentry+0x27/0x8f
Jan 10 10:59:09 twister kernel: RSP: 0018:ffff8100260dbc88 EFLAGS: 00010207
Jan 10 10:59:09 twister kernel: RAX: 0000000000000000 RBX: ffff8100251577d0 RCX: 0000000000000000
Jan 10 10:59:09 twister kernel: RDX: ffff8100260e3410 RSI: ffffffff80643d70 RDI: ffff8100bb32d898
Jan 10 10:59:09 twister kernel: RBP: 0000000000000000 R08: 6390dfaa7e3bd395 R09: ffffffff8050cee6
Jan 10 10:59:09 twister kernel: R10: ffff8100260dbcb8 R11: 0000003000000018 R12: ffff8100b6ef4b90
Jan 10 10:59:09 twister kernel: R13: 00000000fffffff4 R14: ffff81004d9d2600 R15: ffff8100b6eb227e
Jan 10 10:59:09 twister kernel: FS: 00002b5731903ac0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
Jan 10 10:59:09 twister kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 10 10:59:09 twister kernel: CR2: 0000000000000008 CR3: 00000000260b1000 CR4: 00000000000006e0
Jan 10 10:59:09 twister kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 10 10:59:09 twister kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 10 10:59:09 twister kernel: Process pppd (pid: 4490, threadinfo ffff8100260da000, task ffff81009550d060)
Jan 10 10:59:09 twister kernel: Stack: ffff8100260e3690 ffff8100b6ef4b90 ffff81003a5942a0 ffffffff802af272
Jan 10 10:59:09 twister kernel: ffff8100becca2d0 ffff8100becca2d0 ffff8100260df300 ffff81003a5942a0
Jan 10 10:59:09 twister kernel: 00000000fffffff4 ffff81004d9d2600 ffff8100b6eb227e ffffffff803007a5
Jan 10 10:59:09 twister kernel: Call Trace:
Jan 10 10:59:09 twister kernel: [<ffffffff802af272>] sysfs_move_dir+0x63/0x1fd
Jan 10 10:59:09 twister kernel: [<ffffffff803007a5>] kobject_move+0xba/0x110
Jan 10 10:59:09 twister kernel: [<ffffffff803698cc>] device_move+0x59/0x111
Jan 10 10:59:09 twister kernel: [<ffffffff8827d31e>] :rfcomm:rfcomm_tty_open+0x1c1/0x1db
Jan 10 10:59:09 twister kernel: [<ffffffff80224204>] default_wake_function+0x0/0xe
Jan 10 10:59:09 twister kernel: [<ffffffff80345a2d>] tty_open+0x1c5/0x2af
Jan 10 10:59:09 twister kernel: [<ffffffff80278590>] chrdev_open+0x113/0x132
Jan 10 10:59:09 twister kernel: [<ffffffff8027847d>] chrdev_open+0x0/0x132
Jan 10 10:59:09 twister kernel: [<ffffffff80274b4a>] __dentry_open+0xda/0x1aa
Jan 10 10:59:09 twister kernel: [<ffffffff80274cd1>] do_filp_open+0x2d/0x3d
Jan 10 10:59:09 twister kernel: [<ffffffff80473812>] mutex_lock+0xd/0x1d
Jan 10 10:59:09 twister kernel: [<ffffffff802749d6>] get_unused_fd_flags+0x6d/0x100
Jan 10 10:59:09 twister kernel: [<ffffffff80274d27>] do_sys_open+0x46/0xc3
Jan 10 10:59:09 twister kernel: [<ffffffff8020b5fc>] tracesys+0xdc/0xe1
Jan 10 10:59:09 twister kernel:
Jan 10 10:59:09 twister kernel:
Jan 10 10:59:09 twister kernel: Code: 48 8b 45 08 48 39 d0 74 05 48 89 c5 eb f2 48 8b 7b 08 48 81
Jan 10 10:59:09 twister kernel: RIP [<ffffffff802af03f>] sysfs_get_dentry+0x27/0x8f
Jan 10 10:59:09 twister kernel: RSP <ffff8100260dbc88>
Jan 10 10:59:09 twister kernel: CR2: 0000000000000008
Jan 10 10:59:09 twister kernel: ---[ end trace b33fe66800773e0c ]---

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-10 01:11:17

by Dave Young

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Wed, Jan 09, 2008 at 06:16:02PM +0900, Tejun Heo wrote:
> Hello,
>
> My laptop and cell finally decided to talk to each other and I could
> reproduce the bug here. The attached patch should remove the oops.
>
> The bug is two folded. I just skimmed through the bluetooth code and am
> very likely to wrong at places. Please correct me where I'm wrong.
>
> 1. It's introduced by class device migration and the bug will go away
> with CONFIG_SYSFS_DEPRECATED turned on. With CONFIG_SYSFS_DEPRECATED
> turned off, what used to be class devices live under actual devices.
> For rfcomm, this means the rfcommN tty device now lives under the
> aclXXXX node (probably represents an active connection?) instead of the
> class directory.
>
> It seems rfcommN devices are supposed to survive over disconnects so the
> rfcommN device moves under the live connection while connection is alive
> and retreats back to a default directory when the connection is lost.
> This is all fine and dandy as long as the rfcommN device lives under
> class directory as class directory never goes away.
>
> However, with recent sysfs updates, rfcommN now lives directly under the
> aclXXXX node. If the connection goes away while rfcommN device is under
> it, it gets deleted but rfcommN is still treated as alive.
>
> This isn't supported. sysfs doesn't allow parents to die first and the
> dangling children to be salvaged using sysfs_move().
>
> 2. Which in turn exposes three bugs in sysfs
> - sysfs_lookup() returning NULL on negative lookup. sysfs
> code requires that all negative dentries are shot down.
> lookup should return -ENOENT instead of NULL.
> - in move and rename, error handling is wrong. It ends up
> passing ERR_PTR() values to dput().
> - there was an extra dput() in sysfs_move_dir().
>
> The attached patch fixes all sysfs bugs and removes the oops. However,
> rfcommX moving is still broken. The rfcommX device won't be visible
> from sysfs tree after the initial move failure and all following moves
> will fail.
>
> Please confirm the attached patch fixes the oops. I'll separate it into
> two patches and forward them to Greg. But bluetooth code also needs to
> be updated such that it moves the refcommX device before killing the
> connection node.
For bluetooth device_move, the only child device of hci_conn dev is the rfcomm tty_dev. How about the following patch, please verify :

---
net/bluetooth/hci_sysfs.c | 7 +++++++
1 file changed, 7 insertions(+)

diff -upr linux/net/bluetooth/hci_sysfs.c linux.new/net/bluetooth/hci_sysfs.c
--- linux/net/bluetooth/hci_sysfs.c 2008-01-10 09:01:51.000000000 +0800
+++ linux.new/net/bluetooth/hci_sysfs.c 2008-01-10 09:01:22.000000000 +0800
@@ -316,9 +316,16 @@ void hci_conn_add_sysfs(struct hci_conn
schedule_work(&conn->work);
}

+static int hci_conn_move_child(struct device *dev, void *data)
+{
+ device_move(dev, NULL);
+ return 0;
+}
+
static void del_conn(struct work_struct *work)
{
struct hci_conn *conn = container_of(work, struct hci_conn, work);
+ device_for_each_child(&conn->dev, NULL, hci_conn_move_child);
device_del(&conn->dev);
put_device(&conn->dev);
}

>
> Thanks.
>
> --
> tejun

> diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
> diff --git a/fs/dcache.c b/fs/dcache.c
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index 3371629..f281cc6 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -678,8 +678,10 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
> sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);
>
> /* no such entry */
> - if (!sd)
> + if (!sd) {
> + ret = ERR_PTR(-ENOENT);
> goto out_unlock;
> + }
>
> /* attach dentry and inode */
> inode = sysfs_get_inode(sd);
> @@ -781,6 +783,7 @@ int sysfs_rename_dir(struct kobject * kobj, const char *new_name)
> old_dentry = sysfs_get_dentry(sd);
> if (IS_ERR(old_dentry)) {
> error = PTR_ERR(old_dentry);
> + old_dentry = NULL;
> goto out;
> }
>
> @@ -848,6 +851,7 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
> old_dentry = sysfs_get_dentry(sd);
> if (IS_ERR(old_dentry)) {
> error = PTR_ERR(old_dentry);
> + old_dentry = NULL;
> goto out;
> }
> old_parent = old_dentry->d_parent;
> @@ -855,6 +859,7 @@ int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
> new_parent = sysfs_get_dentry(new_parent_sd);
> if (IS_ERR(new_parent)) {
> error = PTR_ERR(new_parent);
> + new_parent = NULL;
> goto out;
> }
>
> @@ -878,7 +883,6 @@ again:
> error = 0;
> d_add(new_dentry, NULL);
> d_move(old_dentry, new_dentry);
> - dput(new_dentry);
>
> /* Remove from old parent's list and insert into new parent's list. */
> sysfs_unlink_sibling(sd);
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-08 13:32:16

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Tue, Jan 08, 2008 at 06:42:43PM +0900, Tejun Heo wrote:

> Thanks. Please apply the attached patch and report the oops.

Jan 8 14:23:29 twister kernel: XXX: moving /devices/virtual/tty/rfcomm0 under /devices/pci0000:00/0000:00:02.0/usb1/1-3/1-3:1.0/hci0/acl001BAFE1624D/tty
Jan 8 14:23:29 twister kernel: XXX: looking up / (ffff8100bf400000) / devices
Jan 8 14:23:29 twister kernel: XXX: looking up devices (ffff8100bf658cc0) / virtual
Jan 8 14:23:29 twister kernel: XXX: looking up virtual (ffff8100bf417220) / tty
Jan 8 14:23:29 twister kernel: XXX: looking up tty (ffff8100bb31c440) / rfcomm0
Jan 8 14:23:29 twister kernel: XXX: looking up / (ffff8100bf400000) / devices
Jan 8 14:23:29 twister kernel: XXX: looking up devices (ffff8100bf658cc0) / pci0000:00
Jan 8 14:23:29 twister kernel: XXX: looking up pci0000:00 (ffff8100bf658880) / 0000:00:02.0
Jan 8 14:23:29 twister kernel: XXX: looking up 0000:00:02.0 (ffff8100bb2abcc0) / usb1
Jan 8 14:23:29 twister kernel: XXX: looking up usb1 (ffff8100bb306000) / 1-3
Jan 8 14:23:29 twister kernel: XXX: looking up 1-3 (ffff8100bb309220) / 1-3:1.0
Jan 8 14:23:29 twister kernel: XXX: looking up 1-3:1.0 (ffff8100bb309880) / hci0
Jan 8 14:23:29 twister kernel: XXX: looking up hci0 (ffff8100bb318aa0) / acl001BAFE1624D
Jan 8 14:23:29 twister kernel: XXX: looking up acl001BAFE1624D (ffff8100b555a000) / tty
Jan 8 14:23:52 twister kernel: XXX: moving /devices/pci0000:00/0000:00:02.0/usb1/1-3/1-3:1.0/hci0/acl001BAFE1624D/tty/rfcomm0 under /devices/virtual/tty
Jan 8 14:23:52 twister kernel: XXX: looking up / (ffff8100bf400000) / devices
Jan 8 14:23:52 twister kernel: XXX: looking up devices (ffff8100bf658cc0) / pci0000:00
Jan 8 14:23:52 twister kernel: XXX: looking up pci0000:00 (ffff8100bf658880) / 0000:00:02.0
Jan 8 14:23:52 twister kernel: XXX: looking up 0000:00:02.0 (ffff8100bb2abcc0) / usb1
Jan 8 14:23:52 twister kernel: XXX: looking up usb1 (ffff8100bb306000) / 1-3
Jan 8 14:23:52 twister kernel: XXX: looking up 1-3 (ffff8100bb309220) / 1-3:1.0
Jan 8 14:23:52 twister kernel: XXX: looking up 1-3:1.0 (ffff8100bb309880) / hci0
Jan 8 14:23:52 twister kernel: XXX: looking up hci0 (ffff8100bb318aa0) / acl001BAFE1624D
Jan 8 14:23:52 twister kernel: XXX: looking up acl001BAFE1624D (0000000000000000) /
Jan 8 14:23:52 twister kernel: Unable to handle kernel NULL pointer dereference at 00000000000000b8 RIP:
Jan 8 14:23:52 twister kernel: [<ffffffff80473875>] mutex_lock+0x10/0x1d
Jan 8 14:23:52 twister kernel: PGD b19b6067 PUD b1969067 PMD 0
Jan 8 14:23:52 twister kernel: Oops: 0002 [1]
Jan 8 14:23:52 twister kernel: CPU 0
Jan 8 14:23:52 twister kernel: Modules linked in: binfmt_misc rfcomm l2cap nfsd auth_rpcgss exportfs ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt saa7134_alsa radeon hwmon_vid eeprom hci_usb bluetooth usb_storage tuner tea5767 tda8290 tuner_simple mt20xx tea5761 snd_intel8x0 sg snd_ac97_codec ac97_bus saa7134 firewire_ohci firewire_core crc_itu_t videobuf_dma_sg videobuf_core ir_kbd_i2c snd_pcm ir_common sky2 forcedeth ehci_hcd snd_timer snd_page_alloc ohci_hcd sr_mod parport_pc parport cdrom floppy
Jan 8 14:23:52 twister kernel: Pid: 3228, comm: cat Not tainted 2.6.24-rc6debug-dirty #7
Jan 8 14:23:52 twister kernel: RIP: 0010:[<ffffffff80473875>] [<ffffffff80473875>] mutex_lock+0x10/0x1d
Jan 8 14:23:52 twister kernel: RSP: 0018:ffff8100b19a5d08 EFLAGS: 00010246
Jan 8 14:23:52 twister kernel: RAX: 0000000000000000 RBX: 00000000000000b8 RCX: 0000000000010352
Jan 8 14:23:52 twister kernel: RDX: ffff8100b19a5fd8 RSI: 0000000000010352 RDI: 00000000000000b8
Jan 8 14:23:52 twister kernel: RBP: ffffffff80593db0 R08: 0000000000000000 R09: 0000000000000097
Jan 8 14:23:52 twister kernel: R10: 000000000000000e R11: ffffffff803103d4 R12: ffff8100ba312b90
Jan 8 14:23:52 twister kernel: R13: ffff8100bfa78d00 R14: ffff8100b1980b00 R15: ffff8100b6c6b840
Jan 8 14:23:52 twister kernel: FS: 00002b3bd213d6e0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
Jan 8 14:23:52 twister kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 8 14:23:52 twister kernel: CR2: 00000000000000b8 CR3: 00000000b19ae000 CR4: 00000000000006e0
Jan 8 14:23:52 twister kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 8 14:23:52 twister kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 8 14:23:52 twister kernel: Process cat (pid: 3228, threadinfo ffff8100b19a4000, task ffff8100b18d9060)
Jan 8 14:23:52 twister kernel: Stack: ffff8100bfa78d00 ffffffff8028560d ffff8100b554ca28 ffffffff802ae9d5
Jan 8 14:23:52 twister kernel: ffff8100bf9f4fa0 ffff8100ba312b90 ffff8100b5326c60 ffffffff802af2de
Jan 8 14:23:52 twister kernel: ffff8100b6ca4cd0 ffff8100b6ca4cd0 ffff8100bfa78d00 ffff8100b5326ba0
Jan 8 14:23:52 twister kernel: Call Trace:
Jan 8 14:23:52 twister kernel: [<ffffffff8028560d>] dput+0x26/0x103
Jan 8 14:23:52 twister kernel: [<ffffffff802ae9d5>] __sysfs_get_dentry+0x5f/0xa9
Jan 8 14:23:52 twister kernel: [<ffffffff802af2de>] sysfs_move_dir+0xae/0x247
Jan 8 14:23:52 twister kernel: [<ffffffff80300805>] kobject_move+0xba/0x110
Jan 8 14:23:52 twister kernel: [<ffffffff8036992c>] device_move+0x59/0x111
Jan 8 14:23:52 twister kernel: [<ffffffff8827c433>] :rfcomm:rfcomm_tty_close+0x2f/0x74
Jan 8 14:23:52 twister kernel: [<ffffffff803447ef>] release_dev+0x212/0x5e2
Jan 8 14:23:52 twister kernel: [<ffffffff8021b609>] do_page_fault+0x2ff/0x65a
Jan 8 14:23:52 twister kernel: [<ffffffff80344bcb>] tty_release+0xc/0x10
Jan 8 14:23:52 twister kernel: [<ffffffff80277013>] __fput+0xb1/0x16f
Jan 8 14:23:52 twister kernel: [<ffffffff80274961>] filp_close+0x5d/0x65
Jan 8 14:23:52 twister kernel: [<ffffffff80275ab3>] sys_close+0x73/0xa6
Jan 8 14:23:52 twister kernel: [<ffffffff8020b5fc>] tracesys+0xdc/0xe1
Jan 8 14:23:52 twister kernel:
Jan 8 14:23:52 twister kernel:
Jan 8 14:23:52 twister kernel: Code: ff 0f 79 05 e8 c9 00 00 00 58 5a 5b c3 41 54 48 8d 47 08 48
Jan 8 14:23:52 twister kernel: RIP [<ffffffff80473875>] mutex_lock+0x10/0x1d
Jan 8 14:23:52 twister kernel: RSP <ffff8100b19a5d08>
Jan 8 14:23:52 twister kernel: CR2: 00000000000000b8
Jan 8 14:23:52 twister kernel: ---[ end trace d7aa1b498ca58476 ]---

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 21:00:24

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Tue, Jan 08, 2008 at 12:24:05AM +0900, Tejun Heo wrote:

> Does the attached patch fix the problem?

No, it still oopses.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 14:13:00

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hi,

On Sat, Jan 05, 2008 at 07:50:39AM +0000, Al Viro wrote:

> Could you stick
> if (!parent->d_inode)
> printk(KERN_WARNING "sysfs locking blows: %s",
> parent->d_name.name);
> right before
> mutex_lock(&parent->d_inode->i_mutex);
> dentry = lookup_one_noperm(cur->s_name, parent);
> mutex_unlock(&parent->d_inode->i_mutex);
> in sysfs_get_dentry() (fs/sysfs/dir.c) and verify that it does, indeed,
> trigger?

Here it is:

Jan 7 14:35:43 twister kernel: sysfs locking blows: acl001BAFE1624D<1>Unable to handle kernel NULL pointer dereference at 00000000000000b8 RIP:
Jan 7 14:35:43 twister kernel: [<ffffffff80473835>] mutex_lock+0x10/0x1d
Jan 7 14:35:43 twister kernel: PGD b6031067 PUD b609c067 PMD 0
Jan 7 14:35:43 twister kernel: Oops: 0002 [1]
Jan 7 14:35:43 twister kernel: CPU 0
Jan 7 14:35:43 twister kernel: Modules linked in: binfmt_misc rfcomm l2cap nfsd auth_rpcgss exportfs ipt_REJECT xt_tcpudp ipt_LOG xt_limit iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc fuse dm_crypt saa7134_alsa radeon hwmon_vid eeprom hci_usb bluetooth usb_storage tuner tea5767 tda8290 tuner_simple mt20xx tea5761 parport_pc parport floppy snd_intel8x0 snd_ac97_codec saa7134 ac97_bus firewire_ohci firewire_core videobuf_dma_sg videobuf_core ir_kbd_i2c sky2 ir_common crc_itu_t snd_pcm forcedeth snd_timer snd_page_alloc ehci_hcd ohci_hcd sg sr_mod cdrom
Jan 7 14:35:43 twister kernel: Pid: 3218, comm: cat Not tainted 2.6.24-rc6debug-dirty #5
Jan 7 14:35:43 twister kernel: RIP: 0010:[<ffffffff80473835>] [<ffffffff80473835>] mutex_lock+0x10/0x1d
Jan 7 14:35:43 twister kernel: RSP: 0018:ffff8100b5733d08 EFLAGS: 00010246
Jan 7 14:35:43 twister kernel: RAX: 0000000000000000 RBX: 00000000000000b8 RCX: 0000000000000001
Jan 7 14:35:43 twister kernel: RDX: ffff8100b5733fd8 RSI: ffff8100bfab6000 RDI: 00000000000000b8
Jan 7 14:35:43 twister kernel: RBP: ffffffff80593db0 R08: ffffffff8057b118 R09: ffffffff806093d0
Jan 7 14:35:43 twister kernel: R10: ffff8100b6074aa0 R11: ffff81000101b100 R12: ffff8100b6074aa0
Jan 7 14:35:43 twister kernel: R13: 00000000fffffff4 R14: ffff8100b571fb80 R15: ffff8100b573e000
Jan 7 14:35:43 twister kernel: FS: 00002aead9fc56e0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
Jan 7 14:35:43 twister kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 7 14:35:43 twister kernel: CR2: 00000000000000b8 CR3: 00000000b60b2000 CR4: 00000000000006e0
Jan 7 14:35:43 twister kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 7 14:35:43 twister kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 7 14:35:43 twister kernel: Process cat (pid: 3218, threadinfo ffff8100b5732000, task ffff8100bfa70830)
Jan 7 14:35:43 twister kernel: Stack: ffffe2000007e310 ffffffff8028560d ffff8100b202a7d0 ffffffff802af076
Jan 7 14:35:43 twister kernel: ffff8100bfa52f00 ffff8100b6074aa0 ffff8100b618c720 ffffffff802af28b
Jan 7 14:35:43 twister kernel: ffff8100b61308d0 ffff8100b61308d0 ffff8100bfa1a800 ffff8100b618c720
Jan 7 14:35:43 twister kernel: Call Trace:
Jan 7 14:35:43 twister kernel: [<ffffffff8028560d>] dput+0x26/0x103
Jan 7 14:35:43 twister kernel: [<ffffffff802af076>] sysfs_get_dentry+0x5e/0xa8
Jan 7 14:35:43 twister kernel: [<ffffffff802af28b>] sysfs_move_dir+0x63/0x204
Jan 7 14:35:43 twister kernel: [<ffffffff803007c5>] kobject_move+0xba/0x110
Jan 7 14:35:43 twister kernel: [<ffffffff803698ec>] device_move+0x59/0x111
Jan 7 14:35:43 twister kernel: [<ffffffff8827c433>] :rfcomm:rfcomm_tty_close+0x2f/0x74
Jan 7 14:35:43 twister kernel: [<ffffffff803447af>] release_dev+0x212/0x5e2
Jan 7 14:35:43 twister kernel: [<ffffffff8021b609>] do_page_fault+0x2ff/0x65a
Jan 7 14:35:43 twister kernel: [<ffffffff80344b8b>] tty_release+0xc/0x10
Jan 7 14:35:43 twister kernel: [<ffffffff80277013>] __fput+0xb1/0x16f
Jan 7 14:35:43 twister kernel: [<ffffffff80274961>] filp_close+0x5d/0x65
Jan 7 14:35:43 twister kernel: [<ffffffff80275ab3>] sys_close+0x73/0xa6
Jan 7 14:35:43 twister kernel: [<ffffffff8020b5fc>] tracesys+0xdc/0xe1
Jan 7 14:35:43 twister kernel:
Jan 7 14:35:43 twister kernel:
Jan 7 14:35:43 twister kernel: Code: ff 0f 79 05 e8 c9 00 00 00 58 5a 5b c3 41 54 48 8d 47 08 48
Jan 7 14:35:43 twister kernel: RIP [<ffffffff80473835>] mutex_lock+0x10/0x1d
Jan 7 14:35:43 twister kernel: RSP <ffff8100b5733d08>
Jan 7 14:35:43 twister kernel: CR2: 00000000000000b8
Jan 7 14:35:43 twister kernel: ---[ end trace 0e3a1176e9294792 ]---

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 14:10:15

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hi,

On Fri, Jan 04, 2008 at 09:05:42AM +0800, Dave Young wrote:

> Could you try the patch (on 2.6.24-rc6) following and check the debug messages?
>
> diff -upr linux/net/bluetooth/rfcomm/tty.c linux.new/net/bluetooth/rfcomm/tty.c
> --- linux/net/bluetooth/rfcomm/tty.c 2008-01-04 08:58:48.000000000 +0800
> +++ linux.new/net/bluetooth/rfcomm/tty.c 2008-01-04 09:01:01.000000000 +0800
> @@ -313,6 +313,7 @@ static void rfcomm_dev_del(struct rfcomm
> {
> BT_DBG("dev %p", dev);
>
> + dump_stack();
> set_bit(RFCOMM_TTY_RELEASED, &dev->flags);
> rfcomm_dev_put(dev);
> }

This did not trigger.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 10:33:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Al Viro <[email protected]> writes:

> On Mon, Jan 07, 2008 at 06:18:21PM +0900, Tejun Heo wrote:
>> Tejun Heo wrote:
>> > Eric W. Biederman wrote:
>> >>> That said, the mechanism is a bit too fragile. sysfs currently ensures
>> >>> that dentry/inode point to the associated sysfs_dirent. This is mainly
>> >>> remanent of conversion from previous VFS based implementation. I think
>> >>> the right thing to do here is to make sysfs behave like other proper
>> >>> distributed filesystems using d_revalidate.
>> >> Huh? We still need something like sysfs_get_dentry to find the dentries
>> >> for the rename or move operation. So we can call d_move.
>> >
>> > Ah... right. Thanks. :-)
>>
>> On the second thought, can't those too be dealt with d_revalidate?
>
> FVO "dealt with" as pleasant and efficient as using coarse whetstone
> to deal with caries.

Or to say it another way. The linux VFS requires dentries to be
preserved and used as long as we can. dropping them early causes
some weird nasties to show up, in particular it totally messes up
mounting other filesystems on top.

Eric

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 09:22:56

by Al Viro

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Mon, Jan 07, 2008 at 06:18:21PM +0900, Tejun Heo wrote:
> Tejun Heo wrote:
> > Eric W. Biederman wrote:
> >>> That said, the mechanism is a bit too fragile. sysfs currently ensures
> >>> that dentry/inode point to the associated sysfs_dirent. This is mainly
> >>> remanent of conversion from previous VFS based implementation. I think
> >>> the right thing to do here is to make sysfs behave like other proper
> >>> distributed filesystems using d_revalidate.
> >> Huh? We still need something like sysfs_get_dentry to find the dentries
> >> for the rename or move operation. So we can call d_move.
> >
> > Ah... right. Thanks. :-)
>
> On the second thought, can't those too be dealt with d_revalidate?

FVO "dealt with" as pleasant and efficient as using coarse whetstone
to deal with caries.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 09:18:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Tejun Heo wrote:
> Eric W. Biederman wrote:
>>> That said, the mechanism is a bit too fragile. sysfs currently ensures
>>> that dentry/inode point to the associated sysfs_dirent. This is mainly
>>> remanent of conversion from previous VFS based implementation. I think
>>> the right thing to do here is to make sysfs behave like other proper
>>> distributed filesystems using d_revalidate.
>> Huh? We still need something like sysfs_get_dentry to find the dentries
>> for the rename or move operation. So we can call d_move.
>
> Ah... right. Thanks. :-)

On the second thought, can't those too be dealt with d_revalidate?

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 09:17:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Eric W. Biederman wrote:
>> That said, the mechanism is a bit too fragile. sysfs currently ensures
>> that dentry/inode point to the associated sysfs_dirent. This is mainly
>> remanent of conversion from previous VFS based implementation. I think
>> the right thing to do here is to make sysfs behave like other proper
>> distributed filesystems using d_revalidate.
>
> Huh? We still need something like sysfs_get_dentry to find the dentries
> for the rename or move operation. So we can call d_move.

Ah... right. Thanks. :-)

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 08:21:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Tejun Heo <[email protected]> writes:

> Hello,
>
> Tejun Heo wrote:
>> Al Viro wrote:
>>> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>>>> Assuming that this is what we get, everything looks explainable - we
>>>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>>>> gets evicted. We don't have any exclusion, so while we are playing
>>>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>>>> negative; the rest is obvious...
>>>> That part of code is walking down the sysfs tree from the s_root of
>>>> sysfs hierarchy and on each step parent is held using dget() while being
>>>> referenced, so I don't think they can turn negative there.
>>> Turn? Just what stops you from getting a negative (and unhashed) from
>>> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?
>>
>> Right, I haven't thought about that. When sysfs_get_dentry() is called,
>> @sd is always valid so unless there was existing negative dentry, lookup
>> is guaranteed to return positive dentry, but by populating dcache with
>> negative dentry before a node is created, things can go wrong. I don't
>> think that's what's going on here tho. If that was the case, the
>> while() loop looking up the next sd to lookup (@cur) should have blown
>> up as negative dentry will have NULL d_fsdata which doesn't match any sd.
>>
>> I guess what's needed here is d_revalidate() as other distributed
>> filesystems do. I'll test whether this can be actually triggered and
>> prepare a fix. Thanks a lot for pointing out the problem.
>
> This can't happen because lookup of non-existent entry doesn't create a
> negative dentry. The new dentry is never hashed and killed after lookup
> failure, the above scenario can't happen.
>
> That said, the mechanism is a bit too fragile. sysfs currently ensures
> that dentry/inode point to the associated sysfs_dirent. This is mainly
> remanent of conversion from previous VFS based implementation. I think
> the right thing to do here is to make sysfs behave like other proper
> distributed filesystems using d_revalidate.

Huh? We still need something like sysfs_get_dentry to find the dentries
for the rename or move operation. So we can call d_move.

Further we should be talking about sysfs_move_dir not sysfs_rename_dir
as only the networking code calls sysfs_rename_dir. Not that it is
significantly different.

I think I saw a change in lock ordering recently to help distributed
filesystems, and maybe that will simplify some of this.

Anyway I figure if I am to understand this I better go look and see if
I can find the start of this thread. I don't have a clue
where we are blowing up.

Eric

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 08:07:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Dave Young wrote:
> On Thu, Jan 03, 2008 at 02:16:21PM +0100, Gabor Gombas wrote:
>> On Wed, Jan 02, 2008 at 04:16:42PM +0100, Gabor Gombas wrote:
>>
>>> So the patch referenced above does not help. But I've found a very easy
>>> way to trigger the bug:
>>>
>>> - do a "cat /dev/zero > /dev/rfcomm0"
>>> - switch the phone off
>>> - switch the phone on, and the kernel oopses
>> FYI I also remember having oopses with 2.6.22.<something> but I do not
>> have those logs anymore. Also I now booted 2.6.20.7 and it did not oops.
>
> Could you try the patch (on 2.6.24-rc6) following and check the debug messages?
>
> diff -upr linux/net/bluetooth/rfcomm/tty.c linux.new/net/bluetooth/rfcomm/tty.c
> --- linux/net/bluetooth/rfcomm/tty.c 2008-01-04 08:58:48.000000000 +0800
> +++ linux.new/net/bluetooth/rfcomm/tty.c 2008-01-04 09:01:01.000000000 +0800
> @@ -313,6 +313,7 @@ static void rfcomm_dev_del(struct rfcomm
> {
> BT_DBG("dev %p", dev);
>
> + dump_stack();
> set_bit(RFCOMM_TTY_RELEASED, &dev->flags);
> rfcomm_dev_put(dev);
> }

Tried to reproduce the problem here but I can't persuade my laptop to
talk to my cell. :-( Gabor, can you please report what Dave Young asked?

Thanks.

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-07 02:37:45

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hello,

Tejun Heo wrote:
> Al Viro wrote:
>> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>>> Assuming that this is what we get, everything looks explainable - we
>>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>>> gets evicted. We don't have any exclusion, so while we are playing
>>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>>> negative; the rest is obvious...
>>> That part of code is walking down the sysfs tree from the s_root of
>>> sysfs hierarchy and on each step parent is held using dget() while being
>>> referenced, so I don't think they can turn negative there.
>> Turn? Just what stops you from getting a negative (and unhashed) from
>> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?
>
> Right, I haven't thought about that. When sysfs_get_dentry() is called,
> @sd is always valid so unless there was existing negative dentry, lookup
> is guaranteed to return positive dentry, but by populating dcache with
> negative dentry before a node is created, things can go wrong. I don't
> think that's what's going on here tho. If that was the case, the
> while() loop looking up the next sd to lookup (@cur) should have blown
> up as negative dentry will have NULL d_fsdata which doesn't match any sd.
>
> I guess what's needed here is d_revalidate() as other distributed
> filesystems do. I'll test whether this can be actually triggered and
> prepare a fix. Thanks a lot for pointing out the problem.

This can't happen because lookup of non-existent entry doesn't create a
negative dentry. The new dentry is never hashed and killed after lookup
failure, the above scenario can't happen.

That said, the mechanism is a bit too fragile. sysfs currently ensures
that dentry/inode point to the associated sysfs_dirent. This is mainly
remanent of conversion from previous VFS based implementation. I think
the right thing to do here is to make sysfs behave like other proper
distributed filesystems using d_revalidate.

Thanks.

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-06 03:35:37

by Al Viro

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Sun, Jan 06, 2008 at 11:54:43AM +0900, Tejun Heo wrote:
> That means sysfs_remove_dir() is called on parent while other operations
> are in progress on children, right? sysfs has never allowed such things
> && AFAIK no one does that. It's somewhat implied in the interface (such
> as recursive removing) but I fully agree it's problematic. Things like
> these are why I think we need to unify/simplify locking as I wrote
> previously.

All it takes is kobject_rename() or kobject_move() called asynchronously
wrt removal... I don't see an explicit ban for that.

FWIW, what happens here *is* fishy, but I don't see an outright ban on
that in documentation - rfcomm_tty_open() does
device_move(dev->tty_dev, rfcomm_get_device(dev));
when we get openers, rfcomm_tty_close() does
device_move(dev->tty_dev, NULL);
when the number of openers hits zero. Can happen repeatedly.

Note that device_move() with new parent being NULL is explicitly allowed
and handled, so...

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-06 02:54:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hello, Al Viro.

Al Viro wrote:
> On Sun, Jan 06, 2008 at 11:07:52AM +0900, Tejun Heo wrote:
>
>> Right, I haven't thought about that. When sysfs_get_dentry() is called,
>> @sd is always valid so unless there was existing negative dentry, lookup
>> is guaranteed to return positive dentry, but by populating dcache with
>> negative dentry before a node is created, things can go wrong. I don't
>> think that's what's going on here tho. If that was the case, the
>> while() loop looking up the next sd to lookup (@cur) should have blown
>> up as negative dentry will have NULL d_fsdata which doesn't match any sd.
>
> No. What happens if sd gets unlinked while we are on the way to
> sysfs_get_dentry() and so does its parent?

That means sysfs_remove_dir() is called on parent while other operations
are in progress on children, right? sysfs has never allowed such things
&& AFAIK no one does that. It's somewhat implied in the interface (such
as recursive removing) but I fully agree it's problematic. Things like
these are why I think we need to unify/simplify locking as I wrote
previously.

> The parent is off the
> sibling list, we get negative dentry from lookup. It's not hashed,
> so won't stick around in dcache (which is apparently what you are
> thinking about). However, _THIS_ lookup has returned you a dentry
> with NULL ->d_inode and you are well and truly buggered.

So, the assumption is that while @sd is held and being operated on its
ancestry can never change except for rename and move and those are
protected with sysfs_rename_mutex.

I agree that the locking is awkward but sysfs's internal implementation
has completely changed over last six or so months and the conversion
still isn't complete. It isn't pretty at all but is at least
understandable now, IMHO. :-)

The last pending change is separating out kobject from sysfs and
simplifying locking. I had some disagreement with Greg about the future
of kobject and then there was tsunami of ATA bugs because most major
distros converted to libata from IDE in the second half of the last year
and I'm still buried under it. I'll get back to sysfs once these ATA
bugs subside a bit.

Thanks.

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-06 02:18:26

by Al Viro

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Sun, Jan 06, 2008 at 11:07:52AM +0900, Tejun Heo wrote:

> Right, I haven't thought about that. When sysfs_get_dentry() is called,
> @sd is always valid so unless there was existing negative dentry, lookup
> is guaranteed to return positive dentry, but by populating dcache with
> negative dentry before a node is created, things can go wrong. I don't
> think that's what's going on here tho. If that was the case, the
> while() loop looking up the next sd to lookup (@cur) should have blown
> up as negative dentry will have NULL d_fsdata which doesn't match any sd.

No. What happens if sd gets unlinked while we are on the way to
sysfs_get_dentry() and so does its parent? The parent is off the
sibling list, we get negative dentry from lookup. It's not hashed,
so won't stick around in dcache (which is apparently what you are
thinking about). However, _THIS_ lookup has returned you a dentry
with NULL ->d_inode and you are well and truly buggered.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-06 02:07:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hello,

Al Viro wrote:
> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>> Assuming that this is what we get, everything looks explainable - we
>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>> gets evicted. We don't have any exclusion, so while we are playing
>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>> negative; the rest is obvious...
>> That part of code is walking down the sysfs tree from the s_root of
>> sysfs hierarchy and on each step parent is held using dget() while being
>> referenced, so I don't think they can turn negative there.
>
> Turn? Just what stops you from getting a negative (and unhashed) from
> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?

Right, I haven't thought about that. When sysfs_get_dentry() is called,
@sd is always valid so unless there was existing negative dentry, lookup
is guaranteed to return positive dentry, but by populating dcache with
negative dentry before a node is created, things can go wrong. I don't
think that's what's going on here tho. If that was the case, the
while() loop looking up the next sd to lookup (@cur) should have blown
up as negative dentry will have NULL d_fsdata which doesn't match any sd.

I guess what's needed here is d_revalidate() as other distributed
filesystems do. I'll test whether this can be actually triggered and
prepare a fix. Thanks a lot for pointing out the problem.

>>> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
>>> and the way it plays with fs/namei.c are ucking fugly.
>> Can you elaborate a bit? The locking in sysfs is unconventional but
>> that's mostly from necessity. It has dual interface - vfs and driver
>> model && vfs data structures (dentry and inode) are too big to always
>> keep around, so it basically becomes a small distributed file system
>> where the backing data can change asynchronously.
>
> ... with all fun that creates. As it is, you have those async changers
> of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
> and yet it needs sysfs_mutex inside sysfs_lookup(). So you can't have
> sysfs_get_dentry() under it. So you don't have exclusion with arseloads
> of sysfs tree changes in there. Joy...

There are two locks. sysfs_rename_mutex and sysfs_mutex.
sysfs_rename_mutex is above VFS locks while sysfs_mutex is below VFS
locks. sysfs_rename_mutex() protects against move/rename which can
change the ancestry of a held sysfs_dirent while sysfs_mutex protects
the sd hierarchy itself. Locking can be wrong if sysfs_rename_mutex
locking is missing from the places where ancestry of a held sd can
change but I can't find one ATM. If I'm missing your point again, feel
free to scream at me. :-)

As it's unnecessarily unintuitive, there's a pending change to rename
sysfs_rename_mutex and use it to protect the whole tree structure to
make locking simpler while using sysfs_mutex to guard VFS access such
that the locking hierarchy plainly becomes sysfs_rename_mutex - VFS
locks - sysfs_mutex where all internal sysfs structure is protected by
the outer mutex and the inner one just protects VFS accesses.

> Frankly, with the current state of sysfs the last vestiges of arguments
> used to push it into the tree back then are dead and buried. I'm not
> blaming you, BTW - the shitpile *did* grow past the point where its
> memory footprint became far too large and something needed to be done.
> Unfortunately, it happened too late for that something being "get rid
> of the entire mess" and now we are saddled with it for good.

Yeah, it's too late to get rid of sysfs and regardless implementation
ugliness, which BTW I think has improved a lot during last six or so
months, it's now pretty useful and important to drivers, so I guess the
only option is trying hard to make it better.

Oh, BTW, the ugly lookup_one_noperm() can be removed if LOOKUP_NOPERM
flag is added. The only reason sysfs_lookup() uses the specialized
lookup is to avoid permission check.

Thanks.

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-05 19:45:11

by Al Viro

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
> > Assuming that this is what we get, everything looks explainable - we
> > have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
> > gets evicted. We don't have any exclusion, so while we are playing
> > silly buggers with lookups in sysfs_get_dentry() we have parent become
> > negative; the rest is obvious...
>
> That part of code is walking down the sysfs tree from the s_root of
> sysfs hierarchy and on each step parent is held using dget() while being
> referenced, so I don't think they can turn negative there.

Turn? Just what stops you from getting a negative (and unhashed) from
lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?

> > AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
> > and the way it plays with fs/namei.c are ucking fugly.
>
> Can you elaborate a bit? The locking in sysfs is unconventional but
> that's mostly from necessity. It has dual interface - vfs and driver
> model && vfs data structures (dentry and inode) are too big to always
> keep around, so it basically becomes a small distributed file system
> where the backing data can change asynchronously.

... with all fun that creates. As it is, you have those async changers
of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
and yet it needs sysfs_mutex inside sysfs_lookup(). So you can't have
sysfs_get_dentry() under it. So you don't have exclusion with arseloads
of sysfs tree changes in there. Joy...

Frankly, with the current state of sysfs the last vestiges of arguments
used to push it into the tree back then are dead and buried. I'm not
blaming you, BTW - the shitpile *did* grow past the point where its
memory footprint became far too large and something needed to be done.
Unfortunately, it happened too late for that something being "get rid
of the entire mess" and now we are saddled with it for good.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-05 14:30:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

Hello.

Al Viro wrote:
> sysfs_get_dentry(),
> mutex_lock(&parent->d_inode->i_mutex);
> hitting parent->d_inode either NULL or very close to it, depending on your
> .config; most likely NULL, if offset of i_mutex is 0xb8 in your build.
> That's plausible - 0xb8 is what you'd get on UP build without spinlock
> debugging, lockdep, etc.
>
> Assuming that this is what we get, everything looks explainable - we
> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
> gets evicted. We don't have any exclusion, so while we are playing
> silly buggers with lookups in sysfs_get_dentry() we have parent become
> negative; the rest is obvious...

That part of code is walking down the sysfs tree from the s_root of
sysfs hierarchy and on each step parent is held using dget() while being
referenced, so I don't think they can turn negative there.

> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
> and the way it plays with fs/namei.c are ucking fugly.

Can you elaborate a bit? The locking in sysfs is unconventional but
that's mostly from necessity. It has dual interface - vfs and driver
model && vfs data structures (dentry and inode) are too big to always
keep around, so it basically becomes a small distributed file system
where the backing data can change asynchronously.

> Could you stick
> if (!parent->d_inode)
> printk(KERN_WARNING "sysfs locking blows: %s",
> parent->d_name.name);
> right before
> mutex_lock(&parent->d_inode->i_mutex);
> dentry = lookup_one_noperm(cur->s_name, parent);
> mutex_unlock(&parent->d_inode->i_mutex);
> in sysfs_get_dentry() (fs/sysfs/dir.c) and verify that it does, indeed,
> trigger?

Yes, please.

Thanks.

--
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-05 07:50:39

by Al Viro

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Wed, Jan 02, 2008 at 04:16:42PM +0100, Gabor Gombas wrote:
> Heh, it seems talking about a bug makes it trigger:
>
> Jan 2 16:05:45 twister kernel: Unable to handle kernel NULL pointer dereference at 00000000000000b8 RIP:
> Jan 2 16:05:45 twister kernel: [<ffffffff804720a5>] mutex_lock+0x10/0x1d

> So the patch referenced above does not help. But I've found a very easy
> way to trigger the bug:
>
> - do a "cat /dev/zero > /dev/rfcomm0"
> - switch the phone off
> - switch the phone on, and the kernel oopses

sysfs_get_dentry(),
mutex_lock(&parent->d_inode->i_mutex);
hitting parent->d_inode either NULL or very close to it, depending on your
.config; most likely NULL, if offset of i_mutex is 0xb8 in your build.
That's plausible - 0xb8 is what you'd get on UP build without spinlock
debugging, lockdep, etc.

Assuming that this is what we get, everything looks explainable - we
have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
gets evicted. We don't have any exclusion, so while we are playing
silly buggers with lookups in sysfs_get_dentry() we have parent become
negative; the rest is obvious...

AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
and the way it plays with fs/namei.c are ucking fugly.

Could you stick
if (!parent->d_inode)
printk(KERN_WARNING "sysfs locking blows: %s",
parent->d_name.name);
right before
mutex_lock(&parent->d_inode->i_mutex);
dentry = lookup_one_noperm(cur->s_name, parent);
mutex_unlock(&parent->d_inode->i_mutex);
in sysfs_get_dentry() (fs/sysfs/dir.c) and verify that it does, indeed,
trigger?

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-04 01:05:42

by Dave Young

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Thu, Jan 03, 2008 at 02:16:21PM +0100, Gabor Gombas wrote:
> On Wed, Jan 02, 2008 at 04:16:42PM +0100, Gabor Gombas wrote:
>
> > So the patch referenced above does not help. But I've found a very easy
> > way to trigger the bug:
> >
> > - do a "cat /dev/zero > /dev/rfcomm0"
> > - switch the phone off
> > - switch the phone on, and the kernel oopses
>
> FYI I also remember having oopses with 2.6.22.<something> but I do not
> have those logs anymore. Also I now booted 2.6.20.7 and it did not oops.

Could you try the patch (on 2.6.24-rc6) following and check the debug messages?

diff -upr linux/net/bluetooth/rfcomm/tty.c linux.new/net/bluetooth/rfcomm/tty.c
--- linux/net/bluetooth/rfcomm/tty.c 2008-01-04 08:58:48.000000000 +0800
+++ linux.new/net/bluetooth/rfcomm/tty.c 2008-01-04 09:01:01.000000000 +0800
@@ -313,6 +313,7 @@ static void rfcomm_dev_del(struct rfcomm
{
BT_DBG("dev %p", dev);

+ dump_stack();
set_bit(RFCOMM_TTY_RELEASED, &dev->flags);
rfcomm_dev_put(dev);
}

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel

2008-01-02 14:48:08

by Gabor Gombas

[permalink] [raw]
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs

On Sat, Dec 29, 2007 at 04:07:04PM +0800, Dave Young wrote:

> Please try the -mm tree kernel, might have been fixed by :
> http://lkml.org/lkml/2007/11/18/141

I've applied the patch on top of 2.6.24-rc6 (I don't want to run -mm
kernels on this machine). We'll see what happens.

Gabor

--
---------------------------------------------------------
MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
---------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bluez-devel