2012-01-31 08:43:01

by Andrei Emeltchenko

[permalink] [raw]
Subject: [RESEND 0/3] Fixes kernel crashes and warnings

From: Andrei Emeltchenko <[email protected]>

Set of patches fixes issues with delayed work and traversing RCU
lists.

Andrei Emeltchenko (3):
Bluetooth: Use cancel_work instead of cancel_work_sync
Bluetooth: Use list _safe deleting from conn_hash_list
Bluetooth: Use list _safe deleting from conn chan_list

net/bluetooth/hci_conn.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

--
1.7.4.1



2012-01-31 17:49:19

by Ulisses Furquim

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Marcel,

On Tue, Jan 31, 2012 at 1:58 PM, Marcel Holtmann <[email protected]> wrot=
e:
> Hi Ulisses,
>
>> >> > > - =A0 =A0 =A0 cancel_delayed_work_sync(&conn->disc_work);
>> >> > > + =A0 =A0 =A0 cancel_delayed_work(&conn->disc_work);
>> >> >
>> >> > I'm afraid we must use _sync variant here. disc_work is not suppose=
d to
>> >> > be running after hci_conn is deleted.
>> >> >
>> >> > BTW, I believe we already addressed this issue in patches [PATCH 1/=
2]
>> >> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
>> >> > unneeded locking. These patches are now pushed upstream. Could you
>> >>
>> >> I will check those patches from upstream and let you know.
>> >
>> > crap, I just acked these.
>> >
>> > Johan, forget about my acks and just ignore them. Lets wait until we g=
et
>> > a clean new series.
>> >
>> This change is really wrong because we're on the delete path and Andre
>> sent other patches which I'm almost sure will address this problem.
>
> lets do it this way, I only look at final patches that you and Andrei
> signed off / acked.

Sure, we can do that this way.

Regards,

--=20
Ulisses Furquim
ProFUSION embedded systems
http://profusion.mobi
Mobile: +55 19 9250 0942
Skype: ulissesffs

2012-01-31 15:58:44

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Ulisses,

> >> > > - cancel_delayed_work_sync(&conn->disc_work);
> >> > > + cancel_delayed_work(&conn->disc_work);
> >> >
> >> > I'm afraid we must use _sync variant here. disc_work is not supposed to
> >> > be running after hci_conn is deleted.
> >> >
> >> > BTW, I believe we already addressed this issue in patches [PATCH 1/2]
> >> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
> >> > unneeded locking. These patches are now pushed upstream. Could you
> >>
> >> I will check those patches from upstream and let you know.
> >
> > crap, I just acked these.
> >
> > Johan, forget about my acks and just ignore them. Lets wait until we get
> > a clean new series.
> >
> This change is really wrong because we're on the delete path and Andre
> sent other patches which I'm almost sure will address this problem.

lets do it this way, I only look at final patches that you and Andrei
signed off / acked.

Regards

Marcel



2012-01-31 15:41:47

by Ulisses Furquim

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi,

On Tue, Jan 31, 2012 at 1:11 PM, Marcel Holtmann <[email protected]> wrot=
e:
> Hi Andrei,
>
>> > > - =A0 =A0 =A0 cancel_delayed_work_sync(&conn->disc_work);
>> > > + =A0 =A0 =A0 cancel_delayed_work(&conn->disc_work);
>> >
>> > I'm afraid we must use _sync variant here. disc_work is not supposed t=
o
>> > be running after hci_conn is deleted.
>> >
>> > BTW, I believe we already addressed this issue in patches [PATCH 1/2]
>> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
>> > unneeded locking. These patches are now pushed upstream. Could you
>>
>> I will check those patches from upstream and let you know.
>
> crap, I just acked these.
>
> Johan, forget about my acks and just ignore them. Lets wait until we get
> a clean new series.
>
> Regards
>
> Marcel

This change is really wrong because we're on the delete path and Andre
sent other patches which I'm almost sure will address this problem.

Best regards,

--=20
Ulisses Furquim
ProFUSION embedded systems
http://profusion.mobi
Mobile: +55 19 9250 0942
Skype: ulissesffs

2012-01-31 15:11:37

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Andrei,

> > > - cancel_delayed_work_sync(&conn->disc_work);
> > > + cancel_delayed_work(&conn->disc_work);
> >
> > I'm afraid we must use _sync variant here. disc_work is not supposed to
> > be running after hci_conn is deleted.
> >
> > BTW, I believe we already addressed this issue in patches [PATCH 1/2]
> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
> > unneeded locking. These patches are now pushed upstream. Could you
>
> I will check those patches from upstream and let you know.

crap, I just acked these.

Johan, forget about my acks and just ignore them. Lets wait until we get
a clean new series.

Regards

Marcel



2012-01-31 15:08:10

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RESEND 3/3] Bluetooth: Use list _safe deleting from conn chan_list

Hi Andrei,

> Fixes possible bug when deleting element from the list in
> function hci_chan_list_flush. list_for_each_entry_rcu is used
> and after deleting element from the list we also free pointer
> and then list_entry_rcu is taken from freed pointer.
>
> Signed-off-by: Andrei Emeltchenko <[email protected]>
> ---
> net/bluetooth/hci_conn.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)

Acked-by: Marcel Holtmann <[email protected]>

Regards

Marcel



2012-01-31 15:07:42

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RESEND 2/3] Bluetooth: Use list _safe deleting from conn_hash_list

Hi Andrei,

> Use list_for_each_entry_safe which is safe version against removal
> of list entry. Otherwise we remove hci_conn element and reference
> next element which result in accessing LIST_POISON.
>
> [ 95.571834] Bluetooth: unknown link type 127
> [ 95.578349] BUG: unable to handle kernel paging request at 20002000
> [ 95.580236] IP: [<20002000>] 0x20001fff
> [ 95.580763] *pde = 00000000
> [ 95.581196] Oops: 0000 [#1] SMP
> ...
> [ 95.582298] Pid: 3355, comm: hciconfig Tainted: G O 3.2.0-rc2niko+ #62 innotek GmbH VirtualBox
> [ 95.582298] EIP: 0060:[<20002000>] EFLAGS: 00210206 CPU: 0
> [ 95.582298] EIP is at 0x20002000
> [ 95.582298] EAX: ea4bc000 EBX: ea4bc000 ECX: 20002000 EDX: 00000016
> [ 95.582298] ESI: f49516f8 EDI: f4951008 EBP: ea4f1e6c ESP: ea4f1e50
> [ 95.582298] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [ 95.582298] Process hciconfig (pid: 3355, ti=ea4f0000 task=ea4a8000 task.ti=ea4f0000)
> [ 95.582298] Stack:
> [ 95.582298] f8231ab6 f825518d f8255178 0000007f ea4bc000 f4951000 f495163c ea4f1e98
> [ 95.582298] f822bcb1 f4951000 ea4f1e98 f822d679 0000000c f4951068 ea4a8000 f4951000
> [ 95.582298] ffffffed 00000000 ea4f1ea8 f822e1da 400448ca ea4f6800 ea4f1ee4 f824102f
> [ 95.582298] Call Trace:
> [ 95.582298] [<f8231ab6>] ? hci_conn_hash_flush+0x76/0xf0 [bluetooth]
> [ 95.582298] [<f822bcb1>] hci_dev_do_close+0xc1/0x2e0 [bluetooth]
> [ 95.582298] [<f822d679>] ? hci_dev_get+0x69/0xb0 [bluetooth]
> [ 95.582298] [<f822e1da>] hci_dev_close+0x2a/0x50 [bluetooth]
> [ 95.582298] [<f824102f>] hci_sock_ioctl+0x1af/0x3f0 [bluetooth]
> [ 95.582298] [<c11153ea>] ? handle_pte_fault+0x8a/0x8f0
> [ 95.582298] [<c146becf>] sock_ioctl+0x5f/0x260
> [ 95.582298] [<c146be70>] ? sock_fasync+0x90/0x90
> [ 95.582298] [<c1152b33>] do_vfs_ioctl+0x83/0x5b0
> [ 95.582298] [<c1563f87>] ? do_page_fault+0x297/0x500
> [ 95.582298] [<c1563cf0>] ? spurious_fault+0xd0/0xd0
> [ 95.582298] [<c107165b>] ? up_read+0x1b/0x30
> [ 95.582298] [<c1563f87>] ? do_page_fault+0x297/0x500
> [ 95.582298] [<c100aa9f>] ? init_fpu+0xef/0x160
> [ 95.582298] [<c15617c0>] ? do_debug+0x180/0x180
> [ 95.582298] [<c100a958>] ? fpu_finit+0x28/0x80
> [ 95.582298] [<c11530e7>] sys_ioctl+0x87/0x90
> [ 95.582298] [<c156795f>] sysenter_do_call+0x12/0x38
> ...
>
> Signed-off-by: Andrei Emeltchenko <[email protected]>
> ---
> net/bluetooth/hci_conn.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)

Acked-by: Marcel Holtmann <[email protected]>

Regards

Marcel



2012-01-31 15:07:04

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Andrei,

> Fix deadlock when cancelling delayed work.
>
> [ 584.676126] ======================================================
> [ 584.676126] [ INFO: possible circular locking dependency detected ]
> [ 584.676126] 3.2.0-rc2niko+ #44
> [ 584.676126] -------------------------------------------------------
> [ 584.676126] kworker/u:1/30 is trying to acquire lock:
> [ 584.676126] (&hdev->lock){+.+.+.}, at: [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
> [ 584.676126]
> [ 584.676126] but task is already holding lock:
> [ 584.676126] ((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
> [ 584.676126]
> [ 584.676126] which lock already depends on the new lock.
> [ 584.676126]
> [ 584.676126]
> [ 584.676126] the existing dependency chain (in reverse order) is:
> [ 584.676126]
> [ 584.676126] -> #1 ((&(&conn->disc_work)->work)){+.+...}:
> [ 584.676126] [<c1086748>] lock_acquire+0x88/0x110
> [ 584.676126] [<c1066041>] wait_on_work+0x61/0x210
> [ 584.676126] [<c106630a>] __cancel_work_timer+0x6a/0x110
> [ 584.676126] [<c10663c0>] cancel_delayed_work_sync+0x10/0x20
> [ 584.676126] [<f81f935b>] hci_event_packet+0x3b2b/0x4610 [bluetooth]
> [ 584.676126] [<f81ea78e>] hci_rx_work+0x20e/0x4c0 [bluetooth]
> [ 584.676126] [<c1065aec>] process_one_work+0x17c/0x460
> [ 584.676126] [<c10672e4>] worker_thread+0x124/0x2c0
> [ 584.676126] [<c106be44>] kthread+0x84/0x90
> [ 584.676126] [<c1567f42>] kernel_thread_helper+0x6/0x10
> [ 584.676126]
> [ 584.676126] -> #0 (&hdev->lock){+.+.+.}:
> [ 584.676126] [<c10852cd>] __lock_acquire+0xc0d/0x1ab0
> [ 584.676126] [<c1086748>] lock_acquire+0x88/0x110
> [ 584.676126] [<c155de50>] mutex_lock_nested+0x70/0x320
> [ 584.676126] [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
> [ 584.676126] [<c1065aec>] process_one_work+0x17c/0x460
> [ 584.676126] [<c10672e4>] worker_thread+0x124/0x2c0
> [ 584.676126] [<c106be44>] kthread+0x84/0x90
> [ 584.676126] [<c1567f42>] kernel_thread_helper+0x6/0x10
> [ 584.676126]
> [ 584.676126] other info that might help us debug this:
> [ 584.676126]
> [ 584.676126] Possible unsafe locking scenario:
> [ 584.676126]
> [ 584.676126] CPU0 CPU1
> [ 584.676126] ---- ----
> [ 584.676126] lock((&(&conn->disc_work)->work));
> [ 584.676126] lock(&hdev->lock);
> [ 584.676126] lock((&(&conn->disc_work)->work));
> [ 584.676126] lock(&hdev->lock);
> [ 584.676126]
> [ 584.676126] *** DEADLOCK ***
> [ 584.676126]
> [ 584.676126] 2 locks held by kworker/u:1/30:
> [ 584.676126] #0: (hdev->name){.+.+.+}, at: [<c1065a78>] process_one_work+0x108/0x460
> [ 584.676126] #1: ((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
>
> Signed-off-by: Andrei Emeltchenko <[email protected]>
>
> ---
> v2 - use cancel_delayed_work instead of cancel_delayed_work_sync following
> recommendation from Ulisses Furquim
> ---
> net/bluetooth/hci_conn.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)

Reviewed-by: Ulisses Furquim <[email protected]>
Acked-by: Marcel Holtmann <[email protected]>

Regards

Marcel



2012-01-31 13:02:43

by Andrei Emeltchenko

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Andre,

On Tue, Jan 31, 2012 at 09:31:45AM -0300, Andre Guedes wrote:
> > - ? ? ? cancel_delayed_work_sync(&conn->disc_work);
> > + ? ? ? cancel_delayed_work(&conn->disc_work);
>
> I'm afraid we must use _sync variant here. disc_work is not supposed to
> be running after hci_conn is deleted.
>
> BTW, I believe we already addressed this issue in patches [PATCH 1/2]
> Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
> unneeded locking. These patches are now pushed upstream. Could you

I will check those patches from upstream and let you know.

Best regards
Andrei Emeltchenko

2012-01-31 12:31:45

by Andre Guedes

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Andrei,

On Tue, Jan 31, 2012 at 5:43 AM, Emeltchenko Andrei
<[email protected]> wrote:
> From: Andrei Emeltchenko <[email protected]>
>
> Fix deadlock when cancelling delayed work.
>
> [ ?584.676126] ======================================================
> [ ?584.676126] [ INFO: possible circular locking dependency detected ]
> [ ?584.676126] 3.2.0-rc2niko+ #44
> [ ?584.676126] -------------------------------------------------------
> [ ?584.676126] kworker/u:1/30 is trying to acquire lock:
> [ ?584.676126] ?(&hdev->lock){+.+.+.}, at: [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
> [ ?584.676126]
> [ ?584.676126] but task is already holding lock:
> [ ?584.676126] ?((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
> [ ?584.676126]
> [ ?584.676126] which lock already depends on the new lock.
> [ ?584.676126]
> [ ?584.676126]
> [ ?584.676126] the existing dependency chain (in reverse order) is:
> [ ?584.676126]
> [ ?584.676126] -> #1 ((&(&conn->disc_work)->work)){+.+...}:
> [ ?584.676126] ? ? ? ?[<c1086748>] lock_acquire+0x88/0x110
> [ ?584.676126] ? ? ? ?[<c1066041>] wait_on_work+0x61/0x210
> [ ?584.676126] ? ? ? ?[<c106630a>] __cancel_work_timer+0x6a/0x110
> [ ?584.676126] ? ? ? ?[<c10663c0>] cancel_delayed_work_sync+0x10/0x20
> [ ?584.676126] ? ? ? ?[<f81f935b>] hci_event_packet+0x3b2b/0x4610 [bluetooth]
> [ ?584.676126] ? ? ? ?[<f81ea78e>] hci_rx_work+0x20e/0x4c0 [bluetooth]
> [ ?584.676126] ? ? ? ?[<c1065aec>] process_one_work+0x17c/0x460
> [ ?584.676126] ? ? ? ?[<c10672e4>] worker_thread+0x124/0x2c0
> [ ?584.676126] ? ? ? ?[<c106be44>] kthread+0x84/0x90
> [ ?584.676126] ? ? ? ?[<c1567f42>] kernel_thread_helper+0x6/0x10
> [ ?584.676126]
> [ ?584.676126] -> #0 (&hdev->lock){+.+.+.}:
> [ ?584.676126] ? ? ? ?[<c10852cd>] __lock_acquire+0xc0d/0x1ab0
> [ ?584.676126] ? ? ? ?[<c1086748>] lock_acquire+0x88/0x110
> [ ?584.676126] ? ? ? ?[<c155de50>] mutex_lock_nested+0x70/0x320
> [ ?584.676126] ? ? ? ?[<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
> [ ?584.676126] ? ? ? ?[<c1065aec>] process_one_work+0x17c/0x460
> [ ?584.676126] ? ? ? ?[<c10672e4>] worker_thread+0x124/0x2c0
> [ ?584.676126] ? ? ? ?[<c106be44>] kthread+0x84/0x90
> [ ?584.676126] ? ? ? ?[<c1567f42>] kernel_thread_helper+0x6/0x10
> [ ?584.676126]
> [ ?584.676126] other info that might help us debug this:
> [ ?584.676126]
> [ ?584.676126] ?Possible unsafe locking scenario:
> [ ?584.676126]
> [ ?584.676126] ? ? ? ?CPU0 ? ? ? ? ? ? ? ? ? ?CPU1
> [ ?584.676126] ? ? ? ?---- ? ? ? ? ? ? ? ? ? ?----
> [ ?584.676126] ? lock((&(&conn->disc_work)->work));
> [ ?584.676126] ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lock(&hdev->lock);
> [ ?584.676126] ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lock((&(&conn->disc_work)->work));
> [ ?584.676126] ? lock(&hdev->lock);
> [ ?584.676126]
> [ ?584.676126] ?*** DEADLOCK ***
> [ ?584.676126]
> [ ?584.676126] 2 locks held by kworker/u:1/30:
> [ ?584.676126] ?#0: ?(hdev->name){.+.+.+}, at: [<c1065a78>] process_one_work+0x108/0x460
> [ ?584.676126] ?#1: ?((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
>
> Signed-off-by: Andrei Emeltchenko <[email protected]>
>
> ---
> v2 - use cancel_delayed_work instead of cancel_delayed_work_sync following
> recommendation from Ulisses Furquim
> ---
> ?net/bluetooth/hci_conn.c | ? ?2 +-
> ?1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index 67c94c4..7ec6e54 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -433,7 +433,7 @@ int hci_conn_del(struct hci_conn *conn)
>
> ? ? ? ?del_timer(&conn->idle_timer);
>
> - ? ? ? cancel_delayed_work_sync(&conn->disc_work);
> + ? ? ? cancel_delayed_work(&conn->disc_work);

I'm afraid we must use _sync variant here. disc_work is not supposed to
be running after hci_conn is deleted.

BTW, I believe we already addressed this issue in patches [PATCH 1/2]
Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
unneeded locking. These patches are now pushed upstream. Could you
verify if this lockdep message is still occurring in johan's bluetooth-next
tree?

BR,

Andre

2012-01-31 08:43:03

by Andrei Emeltchenko

[permalink] [raw]
Subject: [RESEND 2/3] Bluetooth: Use list _safe deleting from conn_hash_list

From: Andrei Emeltchenko <[email protected]>

Use list_for_each_entry_safe which is safe version against removal
of list entry. Otherwise we remove hci_conn element and reference
next element which result in accessing LIST_POISON.

[ 95.571834] Bluetooth: unknown link type 127
[ 95.578349] BUG: unable to handle kernel paging request at 20002000
[ 95.580236] IP: [<20002000>] 0x20001fff
[ 95.580763] *pde = 00000000
[ 95.581196] Oops: 0000 [#1] SMP
...
[ 95.582298] Pid: 3355, comm: hciconfig Tainted: G O 3.2.0-rc2niko+ #62 innotek GmbH VirtualBox
[ 95.582298] EIP: 0060:[<20002000>] EFLAGS: 00210206 CPU: 0
[ 95.582298] EIP is at 0x20002000
[ 95.582298] EAX: ea4bc000 EBX: ea4bc000 ECX: 20002000 EDX: 00000016
[ 95.582298] ESI: f49516f8 EDI: f4951008 EBP: ea4f1e6c ESP: ea4f1e50
[ 95.582298] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 95.582298] Process hciconfig (pid: 3355, ti=ea4f0000 task=ea4a8000 task.ti=ea4f0000)
[ 95.582298] Stack:
[ 95.582298] f8231ab6 f825518d f8255178 0000007f ea4bc000 f4951000 f495163c ea4f1e98
[ 95.582298] f822bcb1 f4951000 ea4f1e98 f822d679 0000000c f4951068 ea4a8000 f4951000
[ 95.582298] ffffffed 00000000 ea4f1ea8 f822e1da 400448ca ea4f6800 ea4f1ee4 f824102f
[ 95.582298] Call Trace:
[ 95.582298] [<f8231ab6>] ? hci_conn_hash_flush+0x76/0xf0 [bluetooth]
[ 95.582298] [<f822bcb1>] hci_dev_do_close+0xc1/0x2e0 [bluetooth]
[ 95.582298] [<f822d679>] ? hci_dev_get+0x69/0xb0 [bluetooth]
[ 95.582298] [<f822e1da>] hci_dev_close+0x2a/0x50 [bluetooth]
[ 95.582298] [<f824102f>] hci_sock_ioctl+0x1af/0x3f0 [bluetooth]
[ 95.582298] [<c11153ea>] ? handle_pte_fault+0x8a/0x8f0
[ 95.582298] [<c146becf>] sock_ioctl+0x5f/0x260
[ 95.582298] [<c146be70>] ? sock_fasync+0x90/0x90
[ 95.582298] [<c1152b33>] do_vfs_ioctl+0x83/0x5b0
[ 95.582298] [<c1563f87>] ? do_page_fault+0x297/0x500
[ 95.582298] [<c1563cf0>] ? spurious_fault+0xd0/0xd0
[ 95.582298] [<c107165b>] ? up_read+0x1b/0x30
[ 95.582298] [<c1563f87>] ? do_page_fault+0x297/0x500
[ 95.582298] [<c100aa9f>] ? init_fpu+0xef/0x160
[ 95.582298] [<c15617c0>] ? do_debug+0x180/0x180
[ 95.582298] [<c100a958>] ? fpu_finit+0x28/0x80
[ 95.582298] [<c11530e7>] sys_ioctl+0x87/0x90
[ 95.582298] [<c156795f>] sysenter_do_call+0x12/0x38
...

Signed-off-by: Andrei Emeltchenko <[email protected]>
---
net/bluetooth/hci_conn.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index 7ec6e54..91a83fb 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -791,11 +791,11 @@ timer:
void hci_conn_hash_flush(struct hci_dev *hdev)
{
struct hci_conn_hash *h = &hdev->conn_hash;
- struct hci_conn *c;
+ struct hci_conn *c, *n;

BT_DBG("hdev %s", hdev->name);

- list_for_each_entry_rcu(c, &h->list, list) {
+ list_for_each_entry_safe(c, n, &h->list, list) {
c->state = BT_CLOSED;

hci_proto_disconn_cfm(c, HCI_ERROR_LOCAL_HOST_TERM);
--
1.7.4.1


2012-01-31 08:43:02

by Andrei Emeltchenko

[permalink] [raw]
Subject: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

From: Andrei Emeltchenko <[email protected]>

Fix deadlock when cancelling delayed work.

[ 584.676126] ======================================================
[ 584.676126] [ INFO: possible circular locking dependency detected ]
[ 584.676126] 3.2.0-rc2niko+ #44
[ 584.676126] -------------------------------------------------------
[ 584.676126] kworker/u:1/30 is trying to acquire lock:
[ 584.676126] (&hdev->lock){+.+.+.}, at: [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
[ 584.676126]
[ 584.676126] but task is already holding lock:
[ 584.676126] ((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460
[ 584.676126]
[ 584.676126] which lock already depends on the new lock.
[ 584.676126]
[ 584.676126]
[ 584.676126] the existing dependency chain (in reverse order) is:
[ 584.676126]
[ 584.676126] -> #1 ((&(&conn->disc_work)->work)){+.+...}:
[ 584.676126] [<c1086748>] lock_acquire+0x88/0x110
[ 584.676126] [<c1066041>] wait_on_work+0x61/0x210
[ 584.676126] [<c106630a>] __cancel_work_timer+0x6a/0x110
[ 584.676126] [<c10663c0>] cancel_delayed_work_sync+0x10/0x20
[ 584.676126] [<f81f935b>] hci_event_packet+0x3b2b/0x4610 [bluetooth]
[ 584.676126] [<f81ea78e>] hci_rx_work+0x20e/0x4c0 [bluetooth]
[ 584.676126] [<c1065aec>] process_one_work+0x17c/0x460
[ 584.676126] [<c10672e4>] worker_thread+0x124/0x2c0
[ 584.676126] [<c106be44>] kthread+0x84/0x90
[ 584.676126] [<c1567f42>] kernel_thread_helper+0x6/0x10
[ 584.676126]
[ 584.676126] -> #0 (&hdev->lock){+.+.+.}:
[ 584.676126] [<c10852cd>] __lock_acquire+0xc0d/0x1ab0
[ 584.676126] [<c1086748>] lock_acquire+0x88/0x110
[ 584.676126] [<c155de50>] mutex_lock_nested+0x70/0x320
[ 584.676126] [<f81f001c>] hci_conn_timeout+0x6c/0x190 [bluetooth]
[ 584.676126] [<c1065aec>] process_one_work+0x17c/0x460
[ 584.676126] [<c10672e4>] worker_thread+0x124/0x2c0
[ 584.676126] [<c106be44>] kthread+0x84/0x90
[ 584.676126] [<c1567f42>] kernel_thread_helper+0x6/0x10
[ 584.676126]
[ 584.676126] other info that might help us debug this:
[ 584.676126]
[ 584.676126] Possible unsafe locking scenario:
[ 584.676126]
[ 584.676126] CPU0 CPU1
[ 584.676126] ---- ----
[ 584.676126] lock((&(&conn->disc_work)->work));
[ 584.676126] lock(&hdev->lock);
[ 584.676126] lock((&(&conn->disc_work)->work));
[ 584.676126] lock(&hdev->lock);
[ 584.676126]
[ 584.676126] *** DEADLOCK ***
[ 584.676126]
[ 584.676126] 2 locks held by kworker/u:1/30:
[ 584.676126] #0: (hdev->name){.+.+.+}, at: [<c1065a78>] process_one_work+0x108/0x460
[ 584.676126] #1: ((&(&conn->disc_work)->work)){+.+...}, at: [<c1065a78>] process_one_work+0x108/0x460

Signed-off-by: Andrei Emeltchenko <[email protected]>

---
v2 - use cancel_delayed_work instead of cancel_delayed_work_sync following
recommendation from Ulisses Furquim
---
net/bluetooth/hci_conn.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index 67c94c4..7ec6e54 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -433,7 +433,7 @@ int hci_conn_del(struct hci_conn *conn)

del_timer(&conn->idle_timer);

- cancel_delayed_work_sync(&conn->disc_work);
+ cancel_delayed_work(&conn->disc_work);

del_timer(&conn->auto_accept_timer);

--
1.7.4.1


2012-01-31 08:43:04

by Andrei Emeltchenko

[permalink] [raw]
Subject: [RESEND 3/3] Bluetooth: Use list _safe deleting from conn chan_list

From: Andrei Emeltchenko <[email protected]>

Fixes possible bug when deleting element from the list in
function hci_chan_list_flush. list_for_each_entry_rcu is used
and after deleting element from the list we also free pointer
and then list_entry_rcu is taken from freed pointer.

Signed-off-by: Andrei Emeltchenko <[email protected]>
---
net/bluetooth/hci_conn.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index 91a83fb..c751ecc 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -971,10 +971,10 @@ int hci_chan_del(struct hci_chan *chan)

void hci_chan_list_flush(struct hci_conn *conn)
{
- struct hci_chan *chan;
+ struct hci_chan *chan, *n;

BT_DBG("conn %p", conn);

- list_for_each_entry_rcu(chan, &conn->chan_list, list)
+ list_for_each_entry_safe(chan, n, &conn->chan_list, list)
hci_chan_del(chan);
}
--
1.7.4.1


2012-02-02 08:34:23

by Andrei Emeltchenko

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Marcel,

On Wed, Feb 01, 2012 at 07:53:30AM -0800, Marcel Holtmann wrote:
> > > >> >> > > - cancel_delayed_work_sync(&conn->disc_work);
> > > >> >> > > + cancel_delayed_work(&conn->disc_work);
> > > >> >> >
> > > >> >> > I'm afraid we must use _sync variant here. disc_work is not supposed to
> > > >> >> > be running after hci_conn is deleted.
> > > >> >> >
> > > >> >> > BTW, I believe we already addressed this issue in patches [PATCH 1/2]
> > > >> >> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
> > > >> >> > unneeded locking. These patches are now pushed upstream. Could you
> > > >> >>
> > > >> >> I will check those patches from upstream and let you know.
> > > >> >
> > > >> > crap, I just acked these.
> > > >> >
> > > >> > Johan, forget about my acks and just ignore them. Lets wait until we get
> > > >> > a clean new series.
> > > >> >
> > > >> This change is really wrong because we're on the delete path and Andre
> > > >> sent other patches which I'm almost sure will address this problem.
> > > >
> > > > lets do it this way, I only look at final patches that you and Andrei
> > > > signed off / acked.
> > >
> > > Sure, we can do that this way.
> >
> > I've checked recent upstream code and it works fine so far, please forget
> > about this particular patch.
> >
> > The other 2 patches are still valid.
>
> create a new patch series for it.

Done. Johan, please check the patches in new series.

Best regards
Andrei Emeltchenko

2012-02-01 15:53:30

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Andrei,

> > >> >> > > - cancel_delayed_work_sync(&conn->disc_work);
> > >> >> > > + cancel_delayed_work(&conn->disc_work);
> > >> >> >
> > >> >> > I'm afraid we must use _sync variant here. disc_work is not supposed to
> > >> >> > be running after hci_conn is deleted.
> > >> >> >
> > >> >> > BTW, I believe we already addressed this issue in patches [PATCH 1/2]
> > >> >> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
> > >> >> > unneeded locking. These patches are now pushed upstream. Could you
> > >> >>
> > >> >> I will check those patches from upstream and let you know.
> > >> >
> > >> > crap, I just acked these.
> > >> >
> > >> > Johan, forget about my acks and just ignore them. Lets wait until we get
> > >> > a clean new series.
> > >> >
> > >> This change is really wrong because we're on the delete path and Andre
> > >> sent other patches which I'm almost sure will address this problem.
> > >
> > > lets do it this way, I only look at final patches that you and Andrei
> > > signed off / acked.
> >
> > Sure, we can do that this way.
>
> I've checked recent upstream code and it works fine so far, please forget
> about this particular patch.
>
> The other 2 patches are still valid.

create a new patch series for it.

Regards

Marcel



2012-02-01 13:32:56

by Andrei Emeltchenko

[permalink] [raw]
Subject: Re: [RESEND 1/3] Bluetooth: Use cancel_work instead of cancel_work_sync

Hi Ulisses,

On Tue, Jan 31, 2012 at 03:49:19PM -0200, Ulisses Furquim wrote:
> Hi Marcel,
>
> On Tue, Jan 31, 2012 at 1:58 PM, Marcel Holtmann <[email protected]> wrote:
> > Hi Ulisses,
> >
> >> >> > > - ? ? ? cancel_delayed_work_sync(&conn->disc_work);
> >> >> > > + ? ? ? cancel_delayed_work(&conn->disc_work);
> >> >> >
> >> >> > I'm afraid we must use _sync variant here. disc_work is not supposed to
> >> >> > be running after hci_conn is deleted.
> >> >> >
> >> >> > BTW, I believe we already addressed this issue in patches [PATCH 1/2]
> >> >> > Bluetooth: Fix potential deadlock and [PATCH 2/2] Bluetooth: Remove
> >> >> > unneeded locking. These patches are now pushed upstream. Could you
> >> >>
> >> >> I will check those patches from upstream and let you know.
> >> >
> >> > crap, I just acked these.
> >> >
> >> > Johan, forget about my acks and just ignore them. Lets wait until we get
> >> > a clean new series.
> >> >
> >> This change is really wrong because we're on the delete path and Andre
> >> sent other patches which I'm almost sure will address this problem.
> >
> > lets do it this way, I only look at final patches that you and Andrei
> > signed off / acked.
>
> Sure, we can do that this way.

I've checked recent upstream code and it works fine so far, please forget
about this particular patch.

The other 2 patches are still valid.

Best regards
Andrei Emeltchenko