Return-Path: Date: Mon, 23 Apr 2007 23:56:39 +0200 (CEST) From: Jiri Kosina To: Jeremy Fitzhardinge cc: Greg KH , Marcel Holtmann , maxk@qualcomm.com, bluez-devel@lists.sourceforge.net, Cedric Le Goater , Linux Kernel Mailing List Subject: Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523 In-Reply-To: <462D1B09.9050005@goop.org> Message-ID: References: <462D1B09.9050005@goop.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII List-ID: On Mon, 23 Apr 2007, Jeremy Fitzhardinge wrote: > I got this on resume; it looks like a Bluetooth and/or USB problem. > PM: Removing info for No Bus:hci0 > BUG: sleeping function called from invalid context at net/core/sock.c:1523 > in_atomic():1, irqs_disabled():0 > 1 lock held by khubd/180: > #0: (old_style_rw_init#2){-.-?}, at: [] hci_sock_dev_event+0x42/0xc5 [bluetooth] > [] show_trace_log_lvl+0x1a/0x30 > [] show_trace+0x12/0x14 > [] dump_stack+0x16/0x18 > [] __might_sleep+0xe5/0xeb > [] lock_sock_nested+0x1d/0xc4 > [] hci_sock_dev_event+0xa7/0xc5 [bluetooth] > [] notifier_call_chain+0x20/0x3c > [] atomic_notifier_call_chain+0x27/0x50 > [] hci_notify+0x12/0x14 [bluetooth] > [] hci_unregister_dev+0x4c/0x65 [bluetooth] > [] hci_usb_disconnect+0x42/0x6f [hci_usb] > [] usb_unbind_interface+0x33/0x69 > [] __device_release_driver+0x74/0x90 > [] device_release_driver+0x33/0x4b > [] bus_remove_device+0x73/0x82 > [] device_del+0x169/0x1cf > [] usb_disable_device+0x62/0xc2 > [] usb_disconnect+0x95/0x114 > [] hub_thread+0x2e2/0x99b > [] kthread+0xb5/0xe2 > [] kernel_thread_helper+0x7/0x10 > ======================= > PM: Removing info for usb:4-1:1.0 OK, this probably started happening since b40df5743. Before that commit, hci_sock_dev_event() used bh_lock_sock() to lock the corresponding struct sock. This was obviously buggy - not deadlock safe against l2cap_connect_cfm() from softirq context. This however introduced another problem - hci_sock_dev_event() is now obviously being triggered (for HCI_DEV_UNREG event, when suspending) in atomic context with preemption disabled. This is what lock_sock_nested() complains about, as it is allowed to sleep inside __lock_sock(), waiting for the lock owner. Hmm, *sigh*. I guess the patch below fixes the problem, but it is a masterpiece in the field of ugliness. And I am not sure whether it is completely correct either. Are there any immediate ideas for better solution with respect to how struct sock locking works? diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 71f5cfb..c5c93cd 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -656,7 +656,10 @@ static int hci_sock_dev_event(struct notifier_block *this, unsigned long event, /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, node, &hci_sk_list.head) { - lock_sock(sk); + if (in_atomic()) + bh_lock_sock(sk); + else + lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; @@ -665,7 +668,10 @@ static int hci_sock_dev_event(struct notifier_block *this, unsigned long event, hci_dev_put(hdev); } - release_sock(sk); + if (in_atomic()) + bh_unlock_sock(sk); + else + release_sock(sk); } read_unlock(&hci_sk_list.lock); }