Return-Path: MIME-Version: 1.0 In-Reply-To: References: <1326717111-7031-1-git-send-email-Andrei.Emeltchenko.news@gmail.com> Date: Tue, 17 Jan 2012 20:46:41 -0200 Message-ID: Subject: Re: [RFCv2] Bluetooth: cancel_work instead of cancel_work_sync From: Ulisses Furquim To: Emeltchenko Andrei Cc: linux-bluetooth@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-bluetooth-owner@vger.kernel.org List-ID: Hi Andrei, On Mon, Jan 16, 2012 at 10:43 AM, Ulisses Furquim wrote: > > Hi Andrei, > > On Mon, Jan 16, 2012 at 10:31 AM, Emeltchenko Andrei > wrote: > > From: Andrei Emeltchenko > > > > Fix deadlock when cancelling delayed work. > > > > [ ?584.676126] ====================================================== > > [ ?584.676126] [ INFO: possible circular locking dependency detected ] > > [ ?584.676126] 3.2.0-rc2niko+ #44 > > [ ?584.676126] ------------------------------------------------------- > > [ ?584.676126] kworker/u:1/30 is trying to acquire lock: > > [ ?584.676126] ?(&hdev->lock){+.+.+.}, at: [] hci_conn_timeout+0x6c/0x190 [bluetooth] > > [ ?584.676126] > > [ ?584.676126] but task is already holding lock: > > [ ?584.676126] ?((&(&conn->disc_work)->work)){+.+...}, at: [] process_one_work+0x108/0x460 > > [ ?584.676126] > > [ ?584.676126] which lock already depends on the new lock. > > [ ?584.676126] > > [ ?584.676126] > > [ ?584.676126] the existing dependency chain (in reverse order) is: > > [ ?584.676126] > > [ ?584.676126] -> #1 ((&(&conn->disc_work)->work)){+.+...}: > > [ ?584.676126] ? ? ? ?[] lock_acquire+0x88/0x110 > > [ ?584.676126] ? ? ? ?[] wait_on_work+0x61/0x210 > > [ ?584.676126] ? ? ? ?[] __cancel_work_timer+0x6a/0x110 > > [ ?584.676126] ? ? ? ?[] cancel_delayed_work_sync+0x10/0x20 > > [ ?584.676126] ? ? ? ?[] hci_event_packet+0x3b2b/0x4610 [bluetooth] > > [ ?584.676126] ? ? ? ?[] hci_rx_work+0x20e/0x4c0 [bluetooth] > > [ ?584.676126] ? ? ? ?[] process_one_work+0x17c/0x460 > > [ ?584.676126] ? ? ? ?[] worker_thread+0x124/0x2c0 > > [ ?584.676126] ? ? ? ?[] kthread+0x84/0x90 > > [ ?584.676126] ? ? ? ?[] kernel_thread_helper+0x6/0x10 > > [ ?584.676126] > > [ ?584.676126] -> #0 (&hdev->lock){+.+.+.}: > > [ ?584.676126] ? ? ? ?[] __lock_acquire+0xc0d/0x1ab0 > > [ ?584.676126] ? ? ? ?[] lock_acquire+0x88/0x110 > > [ ?584.676126] ? ? ? ?[] mutex_lock_nested+0x70/0x320 > > [ ?584.676126] ? ? ? ?[] hci_conn_timeout+0x6c/0x190 [bluetooth] > > [ ?584.676126] ? ? ? ?[] process_one_work+0x17c/0x460 > > [ ?584.676126] ? ? ? ?[] worker_thread+0x124/0x2c0 > > [ ?584.676126] ? ? ? ?[] kthread+0x84/0x90 > > [ ?584.676126] ? ? ? ?[] kernel_thread_helper+0x6/0x10 > > [ ?584.676126] > > [ ?584.676126] other info that might help us debug this: > > [ ?584.676126] > > [ ?584.676126] ?Possible unsafe locking scenario: > > [ ?584.676126] > > [ ?584.676126] ? ? ? ?CPU0 ? ? ? ? ? ? ? ? ? ?CPU1 > > [ ?584.676126] ? ? ? ?---- ? ? ? ? ? ? ? ? ? ?---- > > [ ?584.676126] ? lock((&(&conn->disc_work)->work)); > > [ ?584.676126] ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lock(&hdev->lock); > > [ ?584.676126] ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lock((&(&conn->disc_work)->work)); > > [ ?584.676126] ? lock(&hdev->lock); > > [ ?584.676126] > > [ ?584.676126] ?*** DEADLOCK *** > > [ ?584.676126] > > [ ?584.676126] 2 locks held by kworker/u:1/30: > > [ ?584.676126] ?#0: ?(hdev->name){.+.+.+}, at: [] process_one_work+0x108/0x460 > > [ ?584.676126] ?#1: ?((&(&conn->disc_work)->work)){+.+...}, at: [] process_one_work+0x108/0x460 > > > > Signed-off-by: Andrei Emeltchenko > > > > --- > > > > v2 - use cancel_delayed_work instead of cancel_delayed_work_sync following > > recommendation from Ulisses Furquim > > > > ?net/bluetooth/hci_conn.c | ? ?2 +- > > ?1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c > > index 3db4324..e478051 100644 > > --- a/net/bluetooth/hci_conn.c > > +++ b/net/bluetooth/hci_conn.c > > @@ -437,7 +437,7 @@ int hci_conn_del(struct hci_conn *conn) > > > > ? ? ? ?del_timer(&conn->idle_timer); > > > > - ? ? ? cancel_delayed_work_sync(&conn->disc_work); > > + ? ? ? cancel_delayed_work(&conn->disc_work); > > > > ? ? ? ?del_timer(&conn->auto_accept_timer); > > With this you don't wait for pending work to finish but we may still > sleep IIRC. It might be even better to use __cancel_delayed_work() > instead of cancel_delayed_work(). I was checking this and I'm wrong and it should be fine to go with cancel_delayed_work(). I'm going to revise also uses of __cancel_delayed_work() because the timer to queue the work might be running and queue the delayed work again. :-/ I'll send patches to fix that. Regards, -- Ulisses Furquim ProFUSION embedded systems http://profusion.mobi Mobile: +55 19 9250 0942 Skype: ulissesffs