2012-03-31 13:23:38

by Alexander Holler

[permalink] [raw]
Subject: bluetooth: fix deadlock on device reset and power down

Hello,

I've experienced a deadlock on shutdown using kernel 3.3 and tracked it
down. Because I'm not very familiar with the bluetooth stack I'm not
sure if the below patch is correct, but it fixed the problem here.

This patch should go to the stable tree too, if approved.

Regards,

Alexander

--------------
From 9d0902dc07504ab28a31de471cfb3225fb0404c6 Mon Sep 17 00:00:00 2001
From: Alexander Holler <[email protected]>
Date: Sat, 31 Mar 2012 15:03:27 +0200
Subject: [PATCH] bluetooth: fix deadlock on device reset and power down

Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:

bluetoothd calls ioctl HCIDEVDOWN
hci_sock_ioctl()
hci_dev_close()
hci_dev_do_close()
hci_dev_lock(hdev);
inquiry_cache_flush();
hci_conn_hash_flush();
hci_conn_del()
cancel_delayed_work_sync()
hci_conn_timeout()
hci_dev_lock(hdev); /* DEADLOCK */
hci_dev_unlock(hdev);

Signed-off-by: Alexander Holler <[email protected]>
---
net/bluetooth/hci_core.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 5aeb624..3428036 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -629,8 +629,8 @@ static int hci_dev_do_close(struct hci_dev *hdev)

hci_dev_lock(hdev);
inquiry_cache_flush(hdev);
- hci_conn_hash_flush(hdev);
hci_dev_unlock(hdev);
+ hci_conn_hash_flush(hdev);

hci_notify(hdev, HCI_DEV_DOWN);

@@ -713,8 +713,8 @@ int hci_dev_reset(__u16 dev)

hci_dev_lock(hdev);
inquiry_cache_flush(hdev);
- hci_conn_hash_flush(hdev);
hci_dev_unlock(hdev);
+ hci_conn_hash_flush(hdev);

if (hdev->flush)
hdev->flush(hdev);
--
1.7.6.5



2012-04-03 08:37:19

by Alexander Holler

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Hello Genes MailLists, ;)

Am 02.04.2012 16:27, schrieb Genes MailLists:
>
>
> Hi - is this related issue? (crash at shutdown/sleep):
>
> https://bugzilla.kernel.org/show_bug.cgi?id=42975

Looks likely.

Do you have tried if the patch from Andre Guedes (commit
e72acc13c770a82b4ce4a07e9716f29320eae0f8 in Linus tree) helps?

Regards,

Alexander

2012-04-02 14:27:26

by Genes Lists

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down



Hi - is this related issue? (crash at shutdown/sleep):

https://bugzilla.kernel.org/show_bug.cgi?id=42975

gene


2012-04-02 13:52:16

by Andre Guedes

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Hi all,

On Mon, Apr 2, 2012 at 7:16 AM, Alexander Holler <[email protected]> wrote:
> Am 02.04.2012 11:17, schrieb Alexander Holler:
>
>> Am 02.04.2012 11:03, schrieb Andrei Emeltchenko:
>>>
>>> Hi guys,
>>>
>>> On Mon, Apr 02, 2012 at 10:44:43AM +0200, David Herrmann wrote:
>>>>
>>>> Hi Andrei and Alexander
>>>>
>>>> On Mon, Apr 2, 2012 at 10:29 AM, Alexander Holler<[email protected]>
>>>> ?wrote:
>>>>>
>>>>> Am 02.04.2012 08:55, schrieb Andrei Emeltchenko:
>>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
>>>>>>>
>>>>>>> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
>>>>>>> it down. Because I'm not very familiar with the bluetooth stack I'm
>>>>>>> not sure if the below patch is correct, but it fixed the problem
>>>>>>> here.
>>>>>>
>>>>>>
>>>>>> Could you please attach deadlock dump?
>>>>>>
>>>>>>>
>>>>>>> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a
>>>>>>> deadlock:
>>>>>>>
>>>>>>> bluetoothd calls ioctl HCIDEVDOWN
>>>>>>> ? ? hci_sock_ioctl()
>>>>>>> ? ? ? ? hci_dev_close()
>>>>>>> ? ? ? ? ? ? hci_dev_do_close()
>>>>>>> ? ? ? ? ? ? ? ? hci_dev_lock(hdev);
>>>>>>> ? ? ? ? ? ? ? ? inquiry_cache_flush();
>>>>>>> ? ? ? ? ? ? ? ? hci_conn_hash_flush();
>>>>>>> ? ? ? ? ? ? ? ? ? ? hci_conn_del()
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? cancel_delayed_work_sync()
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? hci_conn_timeout()
>>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hci_dev_lock(hdev); /* DEADLOCK */
>>>>>>
>>>>>>
>>>>>> I am actually not sure that hci_conn_timeout locks hdev. Why do you
>>>>>> think
>>>>>> so?
>>>>>
>>>>>
>>>>> By reading the source, printk and suffering through the deadlock. It's
>>>>> especially painfull when using a bt-keyboard and systemd, because
>>>>> systemd tries 4 times (~ some minutes) to kill bluetoothd before it
>>>>> marks the service as failed and finally continues to shut down.
>>>>
>>>>
>>>> hci_conn_timeout does lock the device. See the source. But the problem
>>>
>>>
>>> I think you need to check commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
>>>
>>> commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
>>> Author: Andre Guedes<[email protected]>
>>> Date: ? Fri Jan 27 19:42:03 2012 -0300
>>>
>>> ? ? Bluetooth: Remove unneeded locking
>>>
>>> ? ? We don't need locking hdev in hci_conn_timeout() since it doesn't
>>> ? ? access any hdev's shared resources, it basically queues HCI commands.
>>
>>
>> So if the locks in hci_conn_timeout() aren't needed, your commit which
>> removes them should go to the stable tree because it fixes a painful
>> deadlock.
>
>
> Oh, sorry, that patch is not from you, your first name is only similiar.
> I've added the author to cc, just in case the lock might still be needed in
> 3.3 if no other patches (from 3.4) besides that one are applied.

It is still applicable. It was applied to bluetooth-next tree a week
after 3.3 merge window was closed. This is the reason why it is not
present in 3.3.

BR,

Andre

2012-04-02 10:16:32

by Alexander Holler

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Am 02.04.2012 11:17, schrieb Alexander Holler:
> Am 02.04.2012 11:03, schrieb Andrei Emeltchenko:
>> Hi guys,
>>
>> On Mon, Apr 02, 2012 at 10:44:43AM +0200, David Herrmann wrote:
>>> Hi Andrei and Alexander
>>>
>>> On Mon, Apr 2, 2012 at 10:29 AM, Alexander Holler<[email protected]> wrote:
>>>> Am 02.04.2012 08:55, schrieb Andrei Emeltchenko:
>>>>> Hi Alexander,
>>>>>
>>>>> On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
>>>>>> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
>>>>>> it down. Because I'm not very familiar with the bluetooth stack I'm
>>>>>> not sure if the below patch is correct, but it fixed the problem
>>>>>> here.
>>>>>
>>>>> Could you please attach deadlock dump?
>>>>>
>>>>>>
>>>>>> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:
>>>>>>
>>>>>> bluetoothd calls ioctl HCIDEVDOWN
>>>>>> hci_sock_ioctl()
>>>>>> hci_dev_close()
>>>>>> hci_dev_do_close()
>>>>>> hci_dev_lock(hdev);
>>>>>> inquiry_cache_flush();
>>>>>> hci_conn_hash_flush();
>>>>>> hci_conn_del()
>>>>>> cancel_delayed_work_sync()
>>>>>> hci_conn_timeout()
>>>>>> hci_dev_lock(hdev); /* DEADLOCK */
>>>>>
>>>>> I am actually not sure that hci_conn_timeout locks hdev. Why do you think
>>>>> so?
>>>>
>>>> By reading the source, printk and suffering through the deadlock. It's
>>>> especially painfull when using a bt-keyboard and systemd, because
>>>> systemd tries 4 times (~ some minutes) to kill bluetoothd before it
>>>> marks the service as failed and finally continues to shut down.
>>>
>>> hci_conn_timeout does lock the device. See the source. But the problem
>>
>> I think you need to check commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
>>
>> commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
>> Author: Andre Guedes<[email protected]>
>> Date: Fri Jan 27 19:42:03 2012 -0300
>>
>> Bluetooth: Remove unneeded locking
>>
>> We don't need locking hdev in hci_conn_timeout() since it doesn't
>> access any hdev's shared resources, it basically queues HCI commands.
>
> So if the locks in hci_conn_timeout() aren't needed, your commit which
> removes them should go to the stable tree because it fixes a painful
> deadlock.

Oh, sorry, that patch is not from you, your first name is only similiar.
I've added the author to cc, just in case the lock might still be needed
in 3.3 if no other patches (from 3.4) besides that one are applied.

Regards,

Alexander

2012-04-02 09:17:12

by Alexander Holler

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Am 02.04.2012 11:03, schrieb Andrei Emeltchenko:
> Hi guys,
>
> On Mon, Apr 02, 2012 at 10:44:43AM +0200, David Herrmann wrote:
>> Hi Andrei and Alexander
>>
>> On Mon, Apr 2, 2012 at 10:29 AM, Alexander Holler <[email protected]> wrote:
>>> Am 02.04.2012 08:55, schrieb Andrei Emeltchenko:
>>>> Hi Alexander,
>>>>
>>>> On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
>>>>> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
>>>>> it down. Because I'm not very familiar with the bluetooth stack I'm
>>>>> not sure if the below patch is correct, but it fixed the problem
>>>>> here.
>>>>
>>>> Could you please attach deadlock dump?
>>>>
>>>>>
>>>>> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:
>>>>>
>>>>> bluetoothd calls ioctl HCIDEVDOWN
>>>>> hci_sock_ioctl()
>>>>> hci_dev_close()
>>>>> hci_dev_do_close()
>>>>> hci_dev_lock(hdev);
>>>>> inquiry_cache_flush();
>>>>> hci_conn_hash_flush();
>>>>> hci_conn_del()
>>>>> cancel_delayed_work_sync()
>>>>> hci_conn_timeout()
>>>>> hci_dev_lock(hdev); /* DEADLOCK */
>>>>
>>>> I am actually not sure that hci_conn_timeout locks hdev. Why do you think
>>>> so?
>>>
>>> By reading the source, printk and suffering through the deadlock. It's
>>> especially painfull when using a bt-keyboard and systemd, because
>>> systemd tries 4 times (~ some minutes) to kill bluetoothd before it
>>> marks the service as failed and finally continues to shut down.
>>
>> hci_conn_timeout does lock the device. See the source. But the problem
>
> I think you need to check commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
>
> commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
> Author: Andre Guedes <[email protected]>
> Date: Fri Jan 27 19:42:03 2012 -0300
>
> Bluetooth: Remove unneeded locking
>
> We don't need locking hdev in hci_conn_timeout() since it doesn't
> access any hdev's shared resources, it basically queues HCI commands.

So if the locks in hci_conn_timeout() aren't needed, your commit which
removes them should go to the stable tree because it fixes a painful
deadlock.

Regards,

Alexander


2012-04-02 09:03:46

by Andrei Emeltchenko

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Hi guys,

On Mon, Apr 02, 2012 at 10:44:43AM +0200, David Herrmann wrote:
> Hi Andrei and Alexander
>
> On Mon, Apr 2, 2012 at 10:29 AM, Alexander Holler <[email protected]> wrote:
> > Am 02.04.2012 08:55, schrieb Andrei Emeltchenko:
> >> Hi Alexander,
> >>
> >> On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
> >>> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
> >>> it down. Because I'm not very familiar with the bluetooth stack I'm
> >>> not sure if the below patch is correct, but it fixed the problem
> >>> here.
> >>
> >> Could you please attach deadlock dump?
> >>
> >>>
> >>> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:
> >>>
> >>> bluetoothd calls ioctl HCIDEVDOWN
> >>> ? ? hci_sock_ioctl()
> >>> ? ? ? ? hci_dev_close()
> >>> ? ? ? ? ? ? hci_dev_do_close()
> >>> ? ? ? ? ? ? ? ? hci_dev_lock(hdev);
> >>> ? ? ? ? ? ? ? ? inquiry_cache_flush();
> >>> ? ? ? ? ? ? ? ? hci_conn_hash_flush();
> >>> ? ? ? ? ? ? ? ? ? ? hci_conn_del()
> >>> ? ? ? ? ? ? ? ? ? ? ? ? cancel_delayed_work_sync()
> >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? hci_conn_timeout()
> >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hci_dev_lock(hdev); /* DEADLOCK */
> >>
> >> I am actually not sure that hci_conn_timeout locks hdev. Why do you think
> >> so?
> >
> > By reading the source, printk and suffering through the deadlock. It's
> > especially painfull when using a bt-keyboard and systemd, because
> > systemd tries 4 times (~ some minutes) to kill bluetoothd before it
> > marks the service as failed and finally continues to shut down.
>
> hci_conn_timeout does lock the device. See the source. But the problem

I think you need to check commit e72acc13c770a82b4ce4a07e9716f29320eae0f8

commit e72acc13c770a82b4ce4a07e9716f29320eae0f8
Author: Andre Guedes <[email protected]>
Date: Fri Jan 27 19:42:03 2012 -0300

Bluetooth: Remove unneeded locking

We don't need locking hdev in hci_conn_timeout() since it doesn't
access any hdev's shared resources, it basically queues HCI commands.

Best regards
Andrei Emeltchenko


2012-04-02 08:44:43

by David Herrmann

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Hi Andrei and Alexander

On Mon, Apr 2, 2012 at 10:29 AM, Alexander Holler <[email protected]> wrote:
> Am 02.04.2012 08:55, schrieb Andrei Emeltchenko:
>> Hi Alexander,
>>
>> On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
>>> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
>>> it down. Because I'm not very familiar with the bluetooth stack I'm
>>> not sure if the below patch is correct, but it fixed the problem
>>> here.
>>
>> Could you please attach deadlock dump?
>>
>>>
>>> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:
>>>
>>> bluetoothd calls ioctl HCIDEVDOWN
>>> ? ? hci_sock_ioctl()
>>> ? ? ? ? hci_dev_close()
>>> ? ? ? ? ? ? hci_dev_do_close()
>>> ? ? ? ? ? ? ? ? hci_dev_lock(hdev);
>>> ? ? ? ? ? ? ? ? inquiry_cache_flush();
>>> ? ? ? ? ? ? ? ? hci_conn_hash_flush();
>>> ? ? ? ? ? ? ? ? ? ? hci_conn_del()
>>> ? ? ? ? ? ? ? ? ? ? ? ? cancel_delayed_work_sync()
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? hci_conn_timeout()
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hci_dev_lock(hdev); /* DEADLOCK */
>>
>> I am actually not sure that hci_conn_timeout locks hdev. Why do you think
>> so?
>
> By reading the source, printk and suffering through the deadlock. It's
> especially painfull when using a bt-keyboard and systemd, because
> systemd tries 4 times (~ some minutes) to kill bluetoothd before it
> marks the service as failed and finally continues to shut down.

hci_conn_timeout does lock the device. See the source. But the problem
here is actually a race-condition, too. The do_close() code locks the
device and then cancels all workqueues in a synchronous manner.
However, the hci_conn_timeout work might get started exactly before
calling cancel_delayed_work_sync(). The proper fix would probably be
releasing the lock before calling "cancel_delayed_work_sync()".
However, then we need to make sure that the work is not restarted
while we do not have the lock.
I think we recently introduced some flag that is set while closing a
device. How about checking that in hci_conn_timeout before aquiring
the lock?

> Just try to kill bluetoothd while a bt-mouse or bt-keyboard is connected.

Reproducable, indeed.

> But I have to admit, that my patch is likely the wrong solution as I
> think it will introduce some race conditions. Anyway, I prefer to live
> with them (the race conditions) instead of the deadlock. So for
> inclusion into the kernel a proper solution is needed.
> But already said, I'm not familiar with the bt-stack and don't know
> about the locking strategies inside the stack, so it's hard for me to
> find my way through the source.

Yes, your fix introduces races. We need to hold the lock there!
Applying your fix would introduce harder to trace bugs even during
runtime so we need to fix this properly.

> Regards,
>
> Alexander

Thanks
David

2012-04-02 08:29:43

by Alexander Holler

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Am 02.04.2012 08:55, schrieb Andrei Emeltchenko:
> Hi Alexander,
>
> On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
>> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
>> it down. Because I'm not very familiar with the bluetooth stack I'm
>> not sure if the below patch is correct, but it fixed the problem
>> here.
>
> Could you please attach deadlock dump?
>
>>
>> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:
>>
>> bluetoothd calls ioctl HCIDEVDOWN
>> hci_sock_ioctl()
>> hci_dev_close()
>> hci_dev_do_close()
>> hci_dev_lock(hdev);
>> inquiry_cache_flush();
>> hci_conn_hash_flush();
>> hci_conn_del()
>> cancel_delayed_work_sync()
>> hci_conn_timeout()
>> hci_dev_lock(hdev); /* DEADLOCK */
>
> I am actually not sure that hci_conn_timeout locks hdev. Why do you think
> so?

By reading the source, printk and suffering through the deadlock. It's
especially painfull when using a bt-keyboard and systemd, because
systemd tries 4 times (~ some minutes) to kill bluetoothd before it
marks the service as failed and finally continues to shut down.

Just try to kill bluetoothd while a bt-mouse or bt-keyboard is connected.

But I have to admit, that my patch is likely the wrong solution as I
think it will introduce some race conditions. Anyway, I prefer to live
with them (the race conditions) instead of the deadlock. So for
inclusion into the kernel a proper solution is needed.
But already said, I'm not familiar with the bt-stack and don't know
about the locking strategies inside the stack, so it's hard for me to
find my way through the source.

Regards,

Alexander

2012-04-02 06:55:27

by Andrei Emeltchenko

[permalink] [raw]
Subject: Re: bluetooth: fix deadlock on device reset and power down

Hi Alexander,

On Sat, Mar 31, 2012 at 03:23:38PM +0200, Alexander Holler wrote:
> I've experienced a deadlock on shutdown using kernel 3.3 and tracked
> it down. Because I'm not very familiar with the bluetooth stack I'm
> not sure if the below patch is correct, but it fixed the problem
> here.

Could you please attach deadlock dump?

>
> Commit 09fd0de5bd8f8ef3317e5365f92f1a13dcd89aa9 introduced a deadlock:
>
> bluetoothd calls ioctl HCIDEVDOWN
> hci_sock_ioctl()
> hci_dev_close()
> hci_dev_do_close()
> hci_dev_lock(hdev);
> inquiry_cache_flush();
> hci_conn_hash_flush();
> hci_conn_del()
> cancel_delayed_work_sync()
> hci_conn_timeout()
> hci_dev_lock(hdev); /* DEADLOCK */

I am actually not sure that hci_conn_timeout locks hdev. Why do you think
so?

Best regards
Andrei Emeltchenko

> hci_dev_unlock(hdev);
>
> Signed-off-by: Alexander Holler <[email protected]>
> ---
> net/bluetooth/hci_core.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 5aeb624..3428036 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -629,8 +629,8 @@ static int hci_dev_do_close(struct hci_dev *hdev)
>
> hci_dev_lock(hdev);
> inquiry_cache_flush(hdev);
> - hci_conn_hash_flush(hdev);
> hci_dev_unlock(hdev);
> + hci_conn_hash_flush(hdev);
>
> hci_notify(hdev, HCI_DEV_DOWN);
>
> @@ -713,8 +713,8 @@ int hci_dev_reset(__u16 dev)
>
> hci_dev_lock(hdev);
> inquiry_cache_flush(hdev);
> - hci_conn_hash_flush(hdev);
> hci_dev_unlock(hdev);
> + hci_conn_hash_flush(hdev);
>
> if (hdev->flush)
> hdev->flush(hdev);