2022-06-14 18:27:28

by Max Krummenacher

[permalink] [raw]
Subject: [PATCH v1] Revert "Bluetooth: core: Fix missing power_on work cancel on HCI close"

From: Max Krummenacher <[email protected]>

This reverts commit ff7f2926114d3a50f5ffe461a9bce8d761748da5.

The commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work
cancel on HCI close") introduced between v5.18 and v5.19-rc1 makes
going to suspend freeze. v5.19-rc2 is equally affected.

This has been seen on a Colibri iMX6ULL WB which has a Marvell 8997
based WiFi / Bluetooth module connected over SDIO.

With 'v5.18' or 'v5.19-rc1 with said commit reverted' a suspend/resume
cycle looks as follows and the device is functional after the resume:

root@imx6ull:~# rfkill
ID TYPE DEVICE SOFT HARD
0 bluetooth hci0 blocked unblocked
1 wlan phy0 blocked unblocked
root@imx6ull:~# echo enabled > /sys/class/tty/ttymxc0/power/wakeup
root@imx6ull:~# date;echo mem > /sys/power/state;date
Tue Jun 14 14:43:03 UTC 2022
[ 6393.464497] PM: suspend entry (deep)
[ 6393.529398] Filesystems sync: 0.064 seconds
[ 6393.594006] Freezing user space processes ... (elapsed 0.015 seconds) done.
[ 6393.610266] OOM killer disabled.
[ 6393.610285] Freezing remaining freezable tasks ... (elapsed 0.013 seconds) done.
[ 6393.623727] printk: Suspending console(s) (use no_console_suspend to debug)

~~ suspended until console initiates the resume

[ 6394.023552] fec 20b4000.ethernet eth0: Link is Down
[ 6394.049902] PM: suspend devices took 0.300 seconds
[ 6394.091654] Disabling non-boot CPUs ...
[ 6394.565896] PM: resume devices took 0.440 seconds
[ 6394.681350] OOM killer enabled.
[ 6394.681369] Restarting tasks ... done.
[ 6394.741157] random: crng reseeded on system resumption
[ 6394.813135] PM: suspend exit
Tue Jun 14 14:43:11 UTC 2022
[ 6396.403873] fec 20b4000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 6396.404347] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
root@imx6ull:~#

With 'v5.19-rc1' suspend freezes in the suspend phase, i.e. power
consumption is not lowered and no wakeup source initiates a wakup.

root@imx6ull:~# rfkill
ID TYPE DEVICE SOFT HARD
0 bluetooth hci0 blocked unblocked
1 wlan phy0 blocked unblocked
root@imx6ull:~# echo enabled > /sys/class/tty/ttymxc0/power/wakeup
root@imx6ull:~# date;echo mem > /sys/power/state;date
Tue Jun 14 12:40:38 UTC 2022
[ 122.476333] PM: suspend entry (deep)
[ 122.556012] Filesystems sync: 0.079 seconds

~~ no further kernel output

If one first unbinds the bluetooth device driver, suspend / resume works
as expected also with 'v5.19-rc1':

root@imx6ull:~# echo mmc1:0001:2 > /sys/bus/sdio/drivers/btmrvl_sdio/unbind
root@imx6ull:~# rfkill
ID TYPE DEVICE SOFT HARD
1 wlan phy0 blocked unblocked
root@imx6ull:~# echo enabled > /sys/class/tty/ttymxc0/power/wakeup
root@imx6ull:~# date;echo mem > /sys/power/state;date
Tue Jun 14 14:59:26 UTC 2022
[ 123.530310] PM: suspend entry (deep)
[ 123.595432] Filesystems sync: 0.064 seconds
[ 123.672478] Freezing user space processes ... (elapsed 0.028 seconds) done.
[ 123.701848] OOM killer disabled.
[ 123.701869] Freezing remaining freezable tasks ... (elapsed 0.007 seconds) done.
[ 123.709993] printk: Suspending console(s) (use no_console_suspend to debug)
[ 124.097772] fec 20b4000.ethernet eth0: Link is Down
[ 124.124795] PM: suspend devices took 0.280 seconds
[ 124.165893] Disabling non-boot CPUs ...
[ 124.632959] PM: resume devices took 0.430 seconds
[ 124.750164] OOM killer enabled.
[ 124.750187] Restarting tasks ... done.
[ 124.827899] random: crng reseeded on system resumption
[ 124.923183] PM: suspend exit
Tue Jun 14 14:59:31 UTC 2022
[ 127.520321] fec 20b4000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 127.520514] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
root@imx6ull:~#

Signed-off-by: Max Krummenacher <[email protected]>

---

net/bluetooth/hci_core.c | 2 ++
net/bluetooth/hci_sync.c | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 59a5c1341c26..19df3905c5f8 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -2675,6 +2675,8 @@ void hci_unregister_dev(struct hci_dev *hdev)
list_del(&hdev->list);
write_unlock(&hci_dev_list_lock);

+ cancel_work_sync(&hdev->power_on);
+
hci_cmd_sync_clear(hdev);

if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks))
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 286d6767f017..1739e8cb3291 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -4088,7 +4088,6 @@ int hci_dev_close_sync(struct hci_dev *hdev)

bt_dev_dbg(hdev, "");

- cancel_work_sync(&hdev->power_on);
cancel_delayed_work(&hdev->power_off);
cancel_delayed_work(&hdev->ncmd_timer);

--
2.20.1


2022-06-14 19:15:37

by bluez.test.bot

[permalink] [raw]
Subject: RE: [v1] Revert "Bluetooth: core: Fix missing power_on work cancel on HCI close"

This is an automated email and please do not reply to this email.

Dear Submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository.

----- Output -----
error: patch failed: net/bluetooth/hci_core.c:2675
error: net/bluetooth/hci_core.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch


Please resolve the issue and submit the patches again.


---
Regards,
Linux Bluetooth

2022-06-23 19:51:58

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v1] Revert "Bluetooth: core: Fix missing power_on work cancel on HCI close"

On Tue, 21 Jun 2022 13:55:14 +0200 Francesco Dolcini wrote:
> Marcel, Vasyl,
> any comment on this?

+1 is there an ETA on the fix getting into net?

> On Tue, Jun 14, 2022 at 08:17:06PM +0200, Max Krummenacher wrote:
> > From: Max Krummenacher <[email protected]>
> >
> > This reverts commit ff7f2926114d3a50f5ffe461a9bce8d761748da5.
> >
> > The commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work
> > cancel on HCI close") introduced between v5.18 and v5.19-rc1 makes
> > going to suspend freeze. v5.19-rc2 is equally affected.
> >
> > This has been seen on a Colibri iMX6ULL WB which has a Marvell 8997
> > based WiFi / Bluetooth module connected over SDIO.
> >
> > With 'v5.18' or 'v5.19-rc1 with said commit reverted' a suspend/resume
> > cycle looks as follows and the device is functional after the resume:

2022-07-04 20:11:32

by Mateusz Jończyk

[permalink] [raw]
Subject: Re: [PATCH v1] Revert "Bluetooth: core: Fix missing power_on work cancel on HCI close"

W dniu 14.06.2022 o 20:17, Max Krummenacher pisze:
> From: Max Krummenacher <[email protected]>
>
> This reverts commit ff7f2926114d3a50f5ffe461a9bce8d761748da5.
>
> The commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work
> cancel on HCI close") introduced between v5.18 and v5.19-rc1 makes
> going to suspend freeze. v5.19-rc2 is equally affected.
>
> This has been seen on a Colibri iMX6ULL WB which has a Marvell 8997
> based WiFi / Bluetooth module connected over SDIO.
> [...]

Hello,

commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")

causes problems also on my laptop (HP 17-by0001nw with a Realtek Bluetooth adapter).

I have Bluetooth disabled by default on startup (via systemd-rfkill.service ) and
vanilla kernel 5.19.0-rc5 fails to suspend (the screen turns black, but I am then able to
touch a trackpad and log in). Reverting that commit on top of 5.19.0-rc5 fixes the issue.

On bare 5.19.0-rc5, after startup, the kworker/u9:0+hci0 process hangs indefinitely
with this stacktrace (obtained through "cat /proc/163/stack" )

        [<0>] __flush_work+0x143/0x220
        [<0>] __cancel_work_timer+0x122/0x1a0
        [<0>] cancel_work_sync+0x10/0x20
        [<0>] hci_dev_close_sync+0x2a/0x550 [bluetooth]
        [<0>] hci_dev_do_close+0x2a/0x60 [bluetooth]
        [<0>] hci_power_on+0x91/0x200 [bluetooth]
        [<0>] process_one_work+0x21c/0x3c0
        [<0>] worker_thread+0x4a/0x3a0
        [<0>] kthread+0xcf/0xf0
        [<0>] ret_from_fork+0x22/0x30

It appears that the hci_power_on() function calls hci_dev_do_close() in this block:

    if (hci_dev_test_flag(hdev, HCI_RFKILLED) ||
         /* [...] */ ) {
        hci_dev_clear_flag(hdev, HCI_AUTO_OFF);
        hci_dev_do_close(hdev);
    } else /* [...] */

which then calls hci_dev_close_sync(). With the problematic commit, that function
calls

       cancel_work_sync(&hdev->power_on)

which tries to cancel the execution of the hci_power_on() function's itself, which leads to a deadlock.

When trying to suspend, the "/lib/systemd/systemd-sleep suspend" process hangs on

        [<0>] hci_suspend_dev+0x87/0xf0 [bluetooth]
        [<0>] hci_suspend_notifier+0x38/0x80 [bluetooth]
        [<0>] notifier_call_chain_robust+0x5e/0xc0
        [<0>] blocking_notifier_call_chain_robust+0x42/0x60
        [<0>] pm_notifier_call_chain_robust+0x1d/0x40
        [<0>] pm_suspend+0x116/0x5a0
        [<0>] state_store+0x82/0xe0
        [<0>] kobj_attr_store+0x12/0x20
        [<0>] sysfs_kf_write+0x3e/0x50
        [<0>] kernfs_fop_write_iter+0x138/0x1c0
        [<0>] new_sync_write+0x104/0x180
        [<0>] vfs_write+0x1d7/0x260
        [<0>] ksys_write+0x67/0xe0
        [<0>] __x64_sys_write+0x1a/0x20
        [<0>] do_syscall_64+0x3b/0x90
        [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

My device is:

Bus 001 Device 005: ID 0bda:b00b Realtek Semiconductor Corp. Bluetooth Radio

So probably the
commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
should be reverted or corrected.

Greetings,

Mateusz Jończyk

> Signed-off-by: Max Krummenacher <[email protected]>
> ---
>
> net/bluetooth/hci_core.c | 2 ++
> net/bluetooth/hci_sync.c | 1 -
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 59a5c1341c26..19df3905c5f8 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -2675,6 +2675,8 @@ void hci_unregister_dev(struct hci_dev *hdev)
> list_del(&hdev->list);
> write_unlock(&hci_dev_list_lock);
>
> + cancel_work_sync(&hdev->power_on);
> +
> hci_cmd_sync_clear(hdev);
>
> if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks))
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index 286d6767f017..1739e8cb3291 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -4088,7 +4088,6 @@ int hci_dev_close_sync(struct hci_dev *hdev)
>
> bt_dev_dbg(hdev, "");
>
> - cancel_work_sync(&hdev->power_on);
> cancel_delayed_work(&hdev->power_off);
> cancel_delayed_work(&hdev->ncmd_timer);
>


2022-07-05 14:06:04

by Vasyl Vavrychuk

[permalink] [raw]
Subject: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

`cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
commit [1] to ensure that power_on work is canceled after HCI interface
down.

But, in certain cases power_on work function may call hci_dev_close_sync
itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
happens when device is rfkilled on boot. To avoid deadlock, move
power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.

Deadlock introduced by commit [1] was reported in [2,3] as broken
suspend. Suspend did not work because `hdev->req_lock` held as result of
`power_on` work deadlock. In fact, other BT features were not working.
It was not observed when testing [1] since it was verified without
rfkill in place.

NOTE: It is not needed to cancel power_on work from other places where
hci_dev_do_close/hci_dev_close_sync is called in case:
* Requests were serialized due to `hdev->req_workqueue`. The power_on
work is first in that workqueue.
* hci_rfkill_set_block which won't close device anyway until HCI_SETUP
is on.
* hci_sock_release which runs after hci_sock_bind which ensures
HCI_SETUP was cleared.

As result, behaviour is the same as in pre-dd06ed7 commit, except
power_on work cancel added to hci_dev_close.

[1]: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
[2]: https://lore.kernel.org/lkml/[email protected]/
[2]: https://lore.kernel.org/lkml/[email protected]/

Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
Signed-off-by: Vasyl Vavrychuk <[email protected]>
Reported-by: Max Krummenacher <[email protected]>
Reported-by: Mateusz Jonczyk <[email protected]>
---
net/bluetooth/hci_core.c | 3 +++
net/bluetooth/hci_sync.c | 1 -
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 59a5c1341c26..a0f99baafd35 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -571,6 +571,7 @@ int hci_dev_close(__u16 dev)
goto done;
}

+ cancel_work_sync(&hdev->power_on);
if (hci_dev_test_and_clear_flag(hdev, HCI_AUTO_OFF))
cancel_delayed_work(&hdev->power_off);

@@ -2675,6 +2676,8 @@ void hci_unregister_dev(struct hci_dev *hdev)
list_del(&hdev->list);
write_unlock(&hci_dev_list_lock);

+ cancel_work_sync(&hdev->power_on);
+
hci_cmd_sync_clear(hdev);

if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks))
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 286d6767f017..1739e8cb3291 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -4088,7 +4088,6 @@ int hci_dev_close_sync(struct hci_dev *hdev)

bt_dev_dbg(hdev, "");

- cancel_work_sync(&hdev->power_on);
cancel_delayed_work(&hdev->power_off);
cancel_delayed_work(&hdev->ncmd_timer);

--
2.30.2

2022-07-05 14:18:30

by Max Krummenacher

[permalink] [raw]
Subject: Re: [PATCH v1] Revert "Bluetooth: core: Fix missing power_on work cancel on HCI close"

On Mon, Jul 4, 2022 at 9:57 PM Mateusz Jończyk <[email protected]> wrote:
>
> W dniu 14.06.2022 o 20:17, Max Krummenacher pisze:
> > From: Max Krummenacher <[email protected]>
> >
> > This reverts commit ff7f2926114d3a50f5ffe461a9bce8d761748da5.
> >
> > The commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work
> > cancel on HCI close") introduced between v5.18 and v5.19-rc1 makes
> > going to suspend freeze. v5.19-rc2 is equally affected.

The following follow up commit fixes the issue for me without the revert.
https://lore.kernel.org/all/[email protected]/

Thanks
Max

2022-07-05 14:19:10

by Max Krummenacher

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

On Tue, Jul 5, 2022 at 3:00 PM Vasyl Vavrychuk
<[email protected]> wrote:
>
> `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
> commit [1] to ensure that power_on work is canceled after HCI interface
> down.
>
> But, in certain cases power_on work function may call hci_dev_close_sync
> itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
> happens when device is rfkilled on boot. To avoid deadlock, move
> power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.
>
> Deadlock introduced by commit [1] was reported in [2,3] as broken
> suspend. Suspend did not work because `hdev->req_lock` held as result of
> `power_on` work deadlock. In fact, other BT features were not working.
> It was not observed when testing [1] since it was verified without
> rfkill in place.
>
> NOTE: It is not needed to cancel power_on work from other places where
> hci_dev_do_close/hci_dev_close_sync is called in case:
> * Requests were serialized due to `hdev->req_workqueue`. The power_on
> work is first in that workqueue.
> * hci_rfkill_set_block which won't close device anyway until HCI_SETUP
> is on.
> * hci_sock_release which runs after hci_sock_bind which ensures
> HCI_SETUP was cleared.
>
> As result, behaviour is the same as in pre-dd06ed7 commit, except
> power_on work cancel added to hci_dev_close.
>
> [1]: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> [2]: https://lore.kernel.org/lkml/[email protected]/
> [2]: https://lore.kernel.org/lkml/[email protected]/
>
> Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> Signed-off-by: Vasyl Vavrychuk <[email protected]>
> Reported-by: Max Krummenacher <[email protected]>
> Reported-by: Mateusz Jonczyk <[email protected]>
> ---
> net/bluetooth/hci_core.c | 3 +++
> net/bluetooth/hci_sync.c | 1 -
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 59a5c1341c26..a0f99baafd35 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -571,6 +571,7 @@ int hci_dev_close(__u16 dev)
> goto done;
> }
>
> + cancel_work_sync(&hdev->power_on);
> if (hci_dev_test_and_clear_flag(hdev, HCI_AUTO_OFF))
> cancel_delayed_work(&hdev->power_off);
>
> @@ -2675,6 +2676,8 @@ void hci_unregister_dev(struct hci_dev *hdev)
> list_del(&hdev->list);
> write_unlock(&hci_dev_list_lock);
>
> + cancel_work_sync(&hdev->power_on);
> +
> hci_cmd_sync_clear(hdev);
>
> if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks))
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index 286d6767f017..1739e8cb3291 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -4088,7 +4088,6 @@ int hci_dev_close_sync(struct hci_dev *hdev)
>
> bt_dev_dbg(hdev, "");
>
> - cancel_work_sync(&hdev->power_on);
> cancel_delayed_work(&hdev->power_off);
> cancel_delayed_work(&hdev->ncmd_timer);
>
> --
> 2.30.2
>

This fixes the issue I described in [1]. I.e. The kernel no longer
freezes while going to suspend.
Tested-by: Max Krummenacher <[email protected]>

Thanks!
Max

2022-07-05 14:21:09

by bluez.test.bot

[permalink] [raw]
Subject: RE: Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

This is an automated email and please do not reply to this email.

Dear Submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository.

----- Output -----
error: patch failed: net/bluetooth/hci_core.c:2675
error: net/bluetooth/hci_core.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch


Please resolve the issue and submit the patches again.


---
Regards,
Linux Bluetooth

2022-07-05 17:32:09

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

Hi,

On Tue, Jul 5, 2022 at 8:14 AM Francesco Dolcini
<[email protected]> wrote:
>
> Hello Vasyl,
>
> On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:
> > Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
>
> This fixes tag is broken, dd06ed7ad057 does not exist on
> torvalds/master, and the `commit` word should be removed.
>
> Should be:
>
> Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")

Ive rebased the patch on top of bluetooth-next and fixed the hash,
lets see if passes CI I might just go ahead and push it.

--
Luiz Augusto von Dentz

2022-07-05 18:48:24

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

On Tue, 5 Jul 2022 10:26:08 -0700 Luiz Augusto von Dentz wrote:
> On Tue, Jul 5, 2022 at 8:14 AM Francesco Dolcini
> <[email protected]> wrote:
> >
> > Hello Vasyl,
> >
> > On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:
> > > Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> >
> > This fixes tag is broken, dd06ed7ad057 does not exist on
> > torvalds/master, and the `commit` word should be removed.
> >
> > Should be:
> >
> > Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
>
> Ive rebased the patch on top of bluetooth-next and fixed the hash,
> lets see if passes CI I might just go ahead and push it.

Thanks for pushing it along, the final version can got thru bluetooth ->
-> net and into 5.19, right?

2022-07-05 18:49:40

by Mateusz Jończyk

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

W dniu 5.07.2022 o 14:59, Vasyl Vavrychuk pisze:
> `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
> commit [1] to ensure that power_on work is canceled after HCI interface
> down.
>
> But, in certain cases power_on work function may call hci_dev_close_sync
> itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
> happens when device is rfkilled on boot. To avoid deadlock, move
> power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.
>
> Deadlock introduced by commit [1] was reported in [2,3] as broken
> suspend. Suspend did not work because `hdev->req_lock` held as result of
> `power_on` work deadlock. In fact, other BT features were not working.
> It was not observed when testing [1] since it was verified without
> rfkill in place.
>
> NOTE: It is not needed to cancel power_on work from other places where
> hci_dev_do_close/hci_dev_close_sync is called in case:
> * Requests were serialized due to `hdev->req_workqueue`. The power_on
> work is first in that workqueue.
> * hci_rfkill_set_block which won't close device anyway until HCI_SETUP
> is on.
> * hci_sock_release which runs after hci_sock_bind which ensures
> HCI_SETUP was cleared.
>
> As result, behaviour is the same as in pre-dd06ed7 commit, except
> power_on work cancel added to hci_dev_close.
>
> [1]: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> [2]: https://lore.kernel.org/lkml/[email protected]/
> [2]: https://lore.kernel.org/lkml/[email protected]/
>
> Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> Signed-off-by: Vasyl Vavrychuk <[email protected]>
> Reported-by: Max Krummenacher <[email protected]>
> Reported-by: Mateusz Jonczyk <[email protected]>

Works well: suspend (with bluetooth on and also off), hibernation, sending files, rfkill.

Thank you.

Reported-and-tested-by: Mateusz Jończyk <[email protected]>

Greetings,

Mateusz Jończyk

2022-07-05 19:05:23

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

Hi Jakub,

On Tue, Jul 5, 2022 at 11:38 AM Jakub Kicinski <[email protected]> wrote:
>
> On Tue, 5 Jul 2022 10:26:08 -0700 Luiz Augusto von Dentz wrote:
> > On Tue, Jul 5, 2022 at 8:14 AM Francesco Dolcini
> > <[email protected]> wrote:
> > >
> > > Hello Vasyl,
> > >
> > > On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:
> > > > Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> > >
> > > This fixes tag is broken, dd06ed7ad057 does not exist on
> > > torvalds/master, and the `commit` word should be removed.
> > >
> > > Should be:
> > >
> > > Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> >
> > Ive rebased the patch on top of bluetooth-next and fixed the hash,
> > lets see if passes CI I might just go ahead and push it.
>
> Thanks for pushing it along, the final version can got thru bluetooth ->
> -> net and into 5.19, right?

Yep, I will send the pull request in a moment.

--
Luiz Augusto von Dentz

2022-07-05 19:14:10

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

On Tue, 5 Jul 2022 12:00:43 -0700 Luiz Augusto von Dentz wrote:
> > > Ive rebased the patch on top of bluetooth-next and fixed the hash,
> > > lets see if passes CI I might just go ahead and push it.
> >
> > Thanks for pushing it along, the final version can got thru bluetooth ->
> > -> net and into 5.19, right?
>
> Yep, I will send the pull request in a moment.

Perfect, thank you!!

2022-07-05 21:53:57

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

Hello:

This patch was applied to netdev/net.git (master)
by Luiz Augusto von Dentz <[email protected]>:

On Tue, 5 Jul 2022 15:59:31 +0300 you wrote:
> `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
> commit [1] to ensure that power_on work is canceled after HCI interface
> down.
>
> But, in certain cases power_on work function may call hci_dev_close_sync
> itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
> happens when device is rfkilled on boot. To avoid deadlock, move
> power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.
>
> [...]

Here is the summary with links:
- Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.
https://git.kernel.org/netdev/net/c/e36bea6e78ab

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html