2021-05-14 10:02:17

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
stops working:
[ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed
[ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on
[ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[ 545.660350] usb 3-10: device descriptor read/64, error -110
[ 561.283530] usb 3-10: device descriptor read/64, error -110
[ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
[ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
[ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
[ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)

Or kernel panics because other workqueues already freed skb:
[ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2048.663775] #PF: supervisor read access in kernel mode
[ 2048.663779] #PF: error_code(0x0000) - not-present page
[ 2048.663782] PGD 0 P4D 0
[ 2048.663787] Oops: 0000 [#1] SMP NOPTI
[ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20
[ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
[ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
[ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
07 83 f8 01
[ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
[ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
[ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
[ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
[ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
[ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
[ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
[ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
[ 2048.663856] PKRU: 55555554
[ 2048.663859] Call Trace:
[ 2048.663865] ? skb_release_head_state+0x5e/0x80
[ 2048.663873] kfree_skb+0x2f/0xb0
[ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb]
[ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth]
[ 2048.663954] ? __cond_resched+0x1a/0x50
[ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth]
[ 2048.664007] rfkill_set_block+0x98/0x170
[ 2048.664016] rfkill_fop_write+0x136/0x1e0
[ 2048.664022] vfs_write+0xc7/0x260
[ 2048.664030] ksys_write+0xb1/0xe0
[ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0
[ 2048.664042] __x64_sys_write+0x1a/0x20
[ 2048.664048] do_syscall_64+0x40/0xb0
[ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2048.664060] RIP: 0033:0x7fe02ac23c27
[ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
[ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
[ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
[ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff

So move the shutdown callback to a place where workqueues are either
flushed or cancelled to resolve the issue.

Signed-off-by: Kai-Heng Feng <[email protected]>
---
v2:
- Rebased on bluetooth-next.

net/bluetooth/hci_core.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 7baf93eda936..6eedf334f943 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev)

BT_DBG("%s %p", hdev->name, hdev);

- if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
- !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
- test_bit(HCI_UP, &hdev->flags)) {
- /* Execute vendor specific shutdown routine */
- if (hdev->shutdown)
- hdev->shutdown(hdev);
- }
-
cancel_delayed_work(&hdev->power_off);
cancel_delayed_work(&hdev->ncmd_timer);

@@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
clear_bit(HCI_INIT, &hdev->flags);
}

+ if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+ !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+ test_bit(HCI_UP, &hdev->flags)) {
+ /* Execute vendor specific shutdown routine */
+ if (hdev->shutdown)
+ hdev->shutdown(hdev);
+ }
+
/* flush cmd work */
flush_work(&hdev->cmd_work);

--
2.30.2



2021-07-28 15:28:14

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek
<[email protected]> wrote:
>
> Hi Kai-Heng, Marcel,
>
> Marcel Holtmann <[email protected]> writes:
>
> > Hi Kai-Heng,
> >
> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> >> stops working:
> >> [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> >> [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> >> [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> >> [ 545.660350] usb 3-10: device descriptor read/64, error -110
> >> [ 561.283530] usb 3-10: device descriptor read/64, error -110
> >> [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> >> [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> >> [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> >> [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >> [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> >> [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >>
> >> Or kernel panics because other workqueues already freed skb:
> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >> [ 2048.663775] #PF: supervisor read access in kernel mode
> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page
> >> [ 2048.663782] PGD 0 P4D 0
> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20
> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> >> 07 83 f8 01
> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> >> [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> >> [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> >> [ 2048.663856] PKRU: 55555554
> >> [ 2048.663859] Call Trace:
> >> [ 2048.663865] ? skb_release_head_state+0x5e/0x80
> >> [ 2048.663873] kfree_skb+0x2f/0xb0
> >> [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb]
> >> [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> >> [ 2048.663954] ? __cond_resched+0x1a/0x50
> >> [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> >> [ 2048.664007] rfkill_set_block+0x98/0x170
> >> [ 2048.664016] rfkill_fop_write+0x136/0x1e0
> >> [ 2048.664022] vfs_write+0xc7/0x260
> >> [ 2048.664030] ksys_write+0xb1/0xe0
> >> [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0
> >> [ 2048.664042] __x64_sys_write+0x1a/0x20
> >> [ 2048.664048] do_syscall_64+0x40/0xb0
> >> [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> >>
> >> So move the shutdown callback to a place where workqueues are either
> >> flushed or cancelled to resolve the issue.
> >>
> >> Signed-off-by: Kai-Heng Feng <[email protected]>
> >> ---
> >> v2:
> >> - Rebased on bluetooth-next.
> >>
> >> net/bluetooth/hci_core.c | 16 ++++++++--------
> >> 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > patch has been applied to bluetooth-next tree.
>
> This patch seems to introduce a regression in the btmtksdio driver.
> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:
>
> i500-pumpkin login: root
> root@i500-pumpkin:~# uname -a
> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Connection timed out (110)
> root@i500-pumpkin:~# hciconfig hci0 down
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Input/output error (5)
>
> Reverting it fixes the above issue.
> Any suggestion on how to fix this without touching hci_core ?
> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.

Can you please attach dmesg? Also, full ftrace log on btmtksdio can
also be helpful.

Kai-Heng

>
> Thanks,
> Mattijs Korpershoek
>
>
> >
> > Regards
> >
> > Marcel

2021-07-30 11:42:27

by Mattijs Korpershoek

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Hi Kai-Heng,

Kai-Heng Feng <[email protected]> writes:

> On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek
> <[email protected]> wrote:
>>
>> Hi Kai-Heng, Marcel,
>>
>> Marcel Holtmann <[email protected]> writes:
>>
>> > Hi Kai-Heng,
>> >
>> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
>> >> stops working:
>> >> [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed
>> >> [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on
>> >> [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> >> [ 545.660350] usb 3-10: device descriptor read/64, error -110
>> >> [ 561.283530] usb 3-10: device descriptor read/64, error -110
>> >> [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> >> [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
>> >> [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
>> >> [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> >> [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
>> >> [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> >>
>> >> Or kernel panics because other workqueues already freed skb:
>> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> >> [ 2048.663775] #PF: supervisor read access in kernel mode
>> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page
>> >> [ 2048.663782] PGD 0 P4D 0
>> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
>> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20
>> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
>> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
>> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
>> >> 07 83 f8 01
>> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
>> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
>> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
>> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
>> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
>> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
>> >> [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
>> >> [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
>> >> [ 2048.663856] PKRU: 55555554
>> >> [ 2048.663859] Call Trace:
>> >> [ 2048.663865] ? skb_release_head_state+0x5e/0x80
>> >> [ 2048.663873] kfree_skb+0x2f/0xb0
>> >> [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb]
>> >> [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth]
>> >> [ 2048.663954] ? __cond_resched+0x1a/0x50
>> >> [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth]
>> >> [ 2048.664007] rfkill_set_block+0x98/0x170
>> >> [ 2048.664016] rfkill_fop_write+0x136/0x1e0
>> >> [ 2048.664022] vfs_write+0xc7/0x260
>> >> [ 2048.664030] ksys_write+0xb1/0xe0
>> >> [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0
>> >> [ 2048.664042] __x64_sys_write+0x1a/0x20
>> >> [ 2048.664048] do_syscall_64+0x40/0xb0
>> >> [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae
>> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
>> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
>> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
>> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
>> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
>> >>
>> >> So move the shutdown callback to a place where workqueues are either
>> >> flushed or cancelled to resolve the issue.
>> >>
>> >> Signed-off-by: Kai-Heng Feng <[email protected]>
>> >> ---
>> >> v2:
>> >> - Rebased on bluetooth-next.
>> >>
>> >> net/bluetooth/hci_core.c | 16 ++++++++--------
>> >> 1 file changed, 8 insertions(+), 8 deletions(-)
>> >
>> > patch has been applied to bluetooth-next tree.
>>
>> This patch seems to introduce a regression in the btmtksdio driver.
>> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:
>>
>> i500-pumpkin login: root
>> root@i500-pumpkin:~# uname -a
>> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Connection timed out (110)
>> root@i500-pumpkin:~# hciconfig hci0 down
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Input/output error (5)
>>
>> Reverting it fixes the above issue.
>> Any suggestion on how to fix this without touching hci_core ?
>> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.
>
> Can you please attach dmesg? Also, full ftrace log on btmtksdio can
> also be helpful.
Thank you for your help. Sorry I did not post the logs previously.

dmesg: https://pastebin.com/tpWDNyQr
ftrace on btmtksdio: https://pastebin.com/jmhvmwUw

Mattijs
>
> Kai-Heng
>
>>
>> Thanks,
>> Mattijs Korpershoek
>>
>>
>> >
>> > Regards
>> >
>> > Marcel

2021-08-03 06:44:21

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Hi Mattijs,

On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
<[email protected]> wrote:
>
> Hi Kai-Heng,

[snipped]

> Thank you for your help. Sorry I did not post the logs previously.
>
> dmesg: https://pastebin.com/tpWDNyQr
> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw

Seems like btmtksdio needs shudown() to be called before flush().
Since the order was there for a very long time, changing the calling
order indeed can break what driver expects.
Can you please test the following patch:
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d..a61e610a400c 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
aosp_do_close(hdev);
msft_do_close(hdev);

+ if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+ !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+ test_bit(HCI_UP, &hdev->flags)) {
+ /* Execute vendor specific shutdown routine */
+ if (hdev->shutdown)
+ hdev->shutdown(hdev);
+ }
+
if (hdev->flush)
hdev->flush(hdev);

@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
clear_bit(HCI_INIT, &hdev->flags);
}

- if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
- !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
- test_bit(HCI_UP, &hdev->flags)) {
- /* Execute vendor specific shutdown routine */
- if (hdev->shutdown)
- hdev->shutdown(hdev);
- }
-
/* flush cmd work */
flush_work(&hdev->cmd_work);

Kai-Heng

>
> Mattijs
> >
> > Kai-Heng
> >
> >>
> >> Thanks,
> >> Mattijs Korpershoek
> >>
> >>
> >> >
> >> > Regards
> >> >
> >> > Marcel

2021-08-03 06:47:09

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <[email protected]> wrote:
>
> On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote:
> > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> > stops working:
> > [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> > [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> > [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > [ 545.660350] usb 3-10: device descriptor read/64, error -110
> > [ 561.283530] usb 3-10: device descriptor read/64, error -110
> > [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> > [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> > [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> > [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >
> > Or kernel panics because other workqueues already freed skb:
> > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 2048.663775] #PF: supervisor read access in kernel mode
> > [ 2048.663779] #PF: error_code(0x0000) - not-present page
> > [ 2048.663782] PGD 0 P4D 0
> > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20
> > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> > 07 83 f8 01
> > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> > [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> > [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> > [ 2048.663856] PKRU: 55555554
> > [ 2048.663859] Call Trace:
> > [ 2048.663865] ? skb_release_head_state+0x5e/0x80
> > [ 2048.663873] kfree_skb+0x2f/0xb0
> > [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb]
> > [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> > [ 2048.663954] ? __cond_resched+0x1a/0x50
> > [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> > [ 2048.664007] rfkill_set_block+0x98/0x170
> > [ 2048.664016] rfkill_fop_write+0x136/0x1e0
> > [ 2048.664022] vfs_write+0xc7/0x260
> > [ 2048.664030] ksys_write+0xb1/0xe0
> > [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0
> > [ 2048.664042] __x64_sys_write+0x1a/0x20
> > [ 2048.664048] do_syscall_64+0x40/0xb0
> > [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> >
> > So move the shutdown callback to a place where workqueues are either
> > flushed or cancelled to resolve the issue.
> >
> > Signed-off-by: Kai-Heng Feng <[email protected]>
> > ---
> > v2:
> > - Rebased on bluetooth-next.
> >
> > net/bluetooth/hci_core.c | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 7baf93eda936..6eedf334f943 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >
> > BT_DBG("%s %p", hdev->name, hdev);
> >
> > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > - test_bit(HCI_UP, &hdev->flags)) {
> > - /* Execute vendor specific shutdown routine */
> > - if (hdev->shutdown)
> > - hdev->shutdown(hdev);
> > - }
> > -
> > cancel_delayed_work(&hdev->power_off);
> > cancel_delayed_work(&hdev->ncmd_timer);
> >
> > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > clear_bit(HCI_INIT, &hdev->flags);
> > }
> >
> > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > + test_bit(HCI_UP, &hdev->flags)) {
> > + /* Execute vendor specific shutdown routine */
> > + if (hdev->shutdown)
> > + hdev->shutdown(hdev);
> > + }
> > +
> > /* flush cmd work */
> > flush_work(&hdev->cmd_work);
> >
> > --
> > 2.30.2
>
> btusb_shutdown_intel_new
> __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
> __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout);
> hci_req_run_skb(&req, hci_req_sync_complete);
>
> hci_req_sync_complete
> if (skb)
> hdev->req_skb = skb_get(skb);
>
> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> driver side, I doubt this patch is the correct fix as it is.

Some workqueues are still active.
The shutdown() should be called at least after hci_request_cancel_all().

Kai-Heng

2021-08-03 08:24:08

by Mattijs Korpershoek

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Hi Kai-Heng,

Kai-Heng Feng <[email protected]> writes:

> Hi Mattijs,
>
> On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
> <[email protected]> wrote:
>>
>> Hi Kai-Heng,
>
> [snipped]
>
>> Thank you for your help. Sorry I did not post the logs previously.
>>
>> dmesg: https://pastebin.com/tpWDNyQr
>> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
>
> Seems like btmtksdio needs shudown() to be called before flush().
> Since the order was there for a very long time, changing the calling
> order indeed can break what driver expects.
> Can you please test the following patch:
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 2560ed2f144d..a61e610a400c 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> aosp_do_close(hdev);
> msft_do_close(hdev);
>
> + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> + test_bit(HCI_UP, &hdev->flags)) {
> + /* Execute vendor specific shutdown routine */
> + if (hdev->shutdown)
> + hdev->shutdown(hdev);
> + }
> +
> if (hdev->flush)
> hdev->flush(hdev);
>
> @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> clear_bit(HCI_INIT, &hdev->flags);
> }
>
> - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> - test_bit(HCI_UP, &hdev->flags)) {
> - /* Execute vendor specific shutdown routine */
> - if (hdev->shutdown)
> - hdev->shutdown(hdev);
> - }
> -
> /* flush cmd work */
> flush_work(&hdev->cmd_work);

Thanks for the patch and your help.
I've tried it, but it seems that it does not improve for me.
I'm still observing:

i500-pumpkin login: root
root@i500-pumpkin:~# hciconfig hci0 up
Can't init device hci0: Connection timed out (110)

Logs for this session:
dmesg: https://pastebin.com/iAFk5Tzi
ftrace: https://pastebin.com/kEMWSYrE


>
> Kai-Heng
>
>>
>> Mattijs
>> >
>> > Kai-Heng
>> >
>> >>
>> >> Thanks,
>> >> Mattijs Korpershoek
>> >>
>> >>
>> >> >
>> >> > Regards
>> >> >
>> >> > Marcel

2021-08-04 15:00:28

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <[email protected]> wrote:
>
> On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote:
> >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <[email protected]> wrote:
> >>
> >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> >> driver side, I doubt this patch is the correct fix as it is.
> >
> >Some workqueues are still active.
> >The shutdown() should be called at least after hci_request_cancel_all().
>
> What is muddy then is how active workqueues prevent skb_get from protecting
> kfree_skb. Can you spot what workqueue it is?

I managed to reproduce the issue with another kernel splat:
------------[ cut here ]------------
kernel BUG at mm/slub.c:321!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16
Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver.
01.03.04 01/07/2021
Workqueue: hci0 discov_update [bluetooth]
RIP: 0010:__slab_free+0x20c/0x3a0
Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24
1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b
49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246
RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600
RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500
RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e
R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80
R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600
FS: 0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
? psi_task_switch+0xc3/0x1e0
? __switch_to_asm+0x36/0x70
? skb_free_head+0x67/0x80
kmem_cache_free+0x370/0x3d0
? kfree_skbmem+0x4e/0x90
kfree_skbmem+0x4e/0x90
kfree_skb+0x47/0xb0
__hci_req_sync+0x134/0x2a0 [bluetooth]
? wait_woken+0x70/0x70
discov_update+0x2ae/0x310 [bluetooth]
process_one_work+0x21d/0x3c0
worker_thread+0x53/0x420
? process_one_work+0x3c0/0x3c0
kthread+0x127/0x150
? set_kthread_struct+0x50/0x50
ret_from_fork+0x1f/0x30
Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep
nls_iso8859_1 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common
snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl
snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci snd_sof
snd_sof_xtensa_dsp snd_sof_intel_hda snd_hda_ext_core
snd_soc_acpi_intel_match snd_soc_acpi ledtrig_audio snd_soc_core
snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core
snd_pcm snd_seq i915 snd_seq_device snd_timer hp_wmi intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp i2c_algo_bit coretemp joydev
kvm_intel ttm mei_hdcp intel_rapl_msr platform_profile wmi_bmof kvm
uvcvideo crct10dif_pclmul btusb videobuf2_vmalloc videobuf2_memops
drm_kms_helper btrtl videobuf2_v4l2 crc32_pclmul btbcm
ghash_clmulni_intel input_leds videobuf2_common btintel snd videodev
syscopyarea sysfillrect sysimgblt aesni_intel mc serio_raw fb_sys_fops
bluetooth crypto_simd
cec cryptd intel_cstate ecdh_generic efi_pstore ecc rc_core
hid_multitouch processor_thermal_device_pci_legacy mei_me
intel_soc_dts_iosf processor_thermal_device mei soundcore
processor_thermal_rfim ee1004 processor_thermal_mbox
processor_thermal_rapl intel_pmt_telemetry intel_rapl_common
intel_pmt_class ucsi_acpi typec_ucsi typec wmi soc_button_array
int3403_thermal int340x_thermal_zone video int3400_thermal
acpi_thermal_rel acpi_pad mac_hid intel_hid sparse_keymap sch_fq_codel
msr parport_pc ppdev lp drm parport ip_tables x_tables autofs4
hid_generic nvme nvme_core intel_lpss_pci e1000e intel_lpss i2c_i801
idma64 i2c_smbus xhci_pci xhci_pci_renesas vmd intel_pmt i2c_hid_acpi
i2c_hid hid pinctrl_tigerlake
---[ end trace c09445d4697039ed ]---

So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update)
and can prevent the race from happening.

And the kernel splat is just one symptom of the issue, most of the
time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in
dmesg.

Kai-Heng

2021-08-04 15:03:37

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek
<[email protected]> wrote:
>
> Hi Kai-Heng,
>
> Kai-Heng Feng <[email protected]> writes:
>
> > Hi Mattijs,
> >
> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
> > <[email protected]> wrote:
> >>
> >> Hi Kai-Heng,
> >
> > [snipped]
> >
> >> Thank you for your help. Sorry I did not post the logs previously.
> >>
> >> dmesg: https://pastebin.com/tpWDNyQr
> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
> >
> > Seems like btmtksdio needs shudown() to be called before flush().
> > Since the order was there for a very long time, changing the calling
> > order indeed can break what driver expects.
> > Can you please test the following patch:
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 2560ed2f144d..a61e610a400c 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > aosp_do_close(hdev);
> > msft_do_close(hdev);
> >
> > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > + test_bit(HCI_UP, &hdev->flags)) {
> > + /* Execute vendor specific shutdown routine */
> > + if (hdev->shutdown)
> > + hdev->shutdown(hdev);
> > + }
> > +
> > if (hdev->flush)
> > hdev->flush(hdev);
> >
> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > clear_bit(HCI_INIT, &hdev->flags);
> > }
> >
> > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > - test_bit(HCI_UP, &hdev->flags)) {
> > - /* Execute vendor specific shutdown routine */
> > - if (hdev->shutdown)
> > - hdev->shutdown(hdev);
> > - }
> > -
> > /* flush cmd work */
> > flush_work(&hdev->cmd_work);
>
> Thanks for the patch and your help.
> I've tried it, but it seems that it does not improve for me.
> I'm still observing:
>
> i500-pumpkin login: root
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Connection timed out (110)
>
> Logs for this session:
> dmesg: https://pastebin.com/iAFk5Tzi
> ftrace: https://pastebin.com/kEMWSYrE

Thanks for the testing!
What about moving the shutdown() part right after hci_req_sync_lock()
so tx/rx can still work:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d4..be3113fb7d4b0 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
hci_request_cancel_all(hdev);
hci_req_sync_lock(hdev);

+ if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+ !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+ test_bit(HCI_UP, &hdev->flags)) {
+ /* Execute vendor specific shutdown routine */
+ if (hdev->shutdown)
+ hdev->shutdown(hdev);
+ }
+
if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
cancel_delayed_work_sync(&hdev->cmd_timer);
hci_req_sync_unlock(hdev);
@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
clear_bit(HCI_INIT, &hdev->flags);
}

- if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
- !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
- test_bit(HCI_UP, &hdev->flags)) {
- /* Execute vendor specific shutdown routine */
- if (hdev->shutdown)
- hdev->shutdown(hdev);
- }
-
/* flush cmd work */
flush_work(&hdev->cmd_work);





>
>
> >
> > Kai-Heng
> >
> >>
> >> Mattijs
> >> >
> >> > Kai-Heng
> >> >
> >> >>
> >> >> Thanks,
> >> >> Mattijs Korpershoek
> >> >>
> >> >>
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Marcel

2021-08-05 05:13:57

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

On Thu, Aug 5, 2021 at 11:00 AM Hillf Danton <[email protected]> wrote:
>
> On Wed, 4 Aug 2021 22:35:43 +0800 Kai-Heng Feng wrote:
> >On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <[email protected]> wrote:
> >>
> >> On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote:
> >> >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <[email protected]> wrote:
> >> >>
> >> >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> >> >> driver side, I doubt this patch is the correct fix as it is.
> >> >
> >> >Some workqueues are still active.
> >> >The shutdown() should be called at least after hci_request_cancel_all().
> >>
> >> What is muddy then is how active workqueues prevent skb_get from protecting
> >> kfree_skb. Can you spot what workqueue it is?
> >
> >I managed to reproduce the issue with another kernel splat:
> >------------[ cut here ]------------
> >kernel BUG at mm/slub.c:321!
> >invalid opcode: 0000 [#1] SMP NOPTI
> >CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16
> >Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver.
> >01.03.04 01/07/2021
> >Workqueue: hci0 discov_update [bluetooth]
> >RIP: 0010:__slab_free+0x20c/0x3a0
> >Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24
> >1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b
> >49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
> >RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246
> >RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600
> >RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500
> >RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e
> >R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80
> >R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600
> >FS: 0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000
> >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0
> >PKRU: 55555554
> >Call Trace:
> > ? psi_task_switch+0xc3/0x1e0
> > ? __switch_to_asm+0x36/0x70
> > ? skb_free_head+0x67/0x80
> > kmem_cache_free+0x370/0x3d0
> > ? kfree_skbmem+0x4e/0x90
> > kfree_skbmem+0x4e/0x90
> > kfree_skb+0x47/0xb0
> > __hci_req_sync+0x134/0x2a0 [bluetooth]
> > ? wait_woken+0x70/0x70
> > discov_update+0x2ae/0x310 [bluetooth]
> > process_one_work+0x21d/0x3c0
> > worker_thread+0x53/0x420
> > ? process_one_work+0x3c0/0x3c0
> > kthread+0x127/0x150
> > ? set_kthread_struct+0x50/0x50
> > ret_from_fork+0x1f/0x30
>
> [...]
>
> >
> >So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update)
> >and can prevent the race from happening.
>
> Given
>
> __hci_req_sync
> err = hci_req_run_skb(&req, hci_req_sync_complete);
> kfree_skb(hdev->req_skb);
>
> hci_req_sync_complete
> if (skb)
> hdev->req_skb = skb_get(skb);
>
> once more skb_get makes the above race hard to understand, though it should
> better check error before freeing skb there to avoid blind free.
>
> >
> >And the kernel splat is just one symptom of the issue, most of the
>
> It fails to support your reasoning so far.

How? Most of the time the BT controller can't be brought up again
after shutdown(), and we need to stop other activities before that.
What other reasoning is expected?

Kai-Heng

>
> >time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in
> >dmesg.
> >
> >Kai-Heng
>
>
> +++ x/net/bluetooth/hci_request.c
> @@ -257,8 +257,10 @@ int __hci_req_sync(struct hci_dev *hdev,
> break;
> }
>
> - kfree_skb(hdev->req_skb);
> - hdev->req_skb = NULL;
> + if (!err) {
> + kfree_skb(hdev->req_skb);
> + hdev->req_skb = NULL;
> + }
> hdev->req_status = hdev->req_result = 0;
>
> bt_dev_dbg(hdev, "end: err %d", err);

2021-08-05 07:21:54

by Mattijs Korpershoek

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Hi Kai-Heng,

Thanks for your patch,

Kai-Heng Feng <[email protected]> writes:

> On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek
> <[email protected]> wrote:
>>
>> Hi Kai-Heng,
>>
>> Kai-Heng Feng <[email protected]> writes:
>>
>> > Hi Mattijs,
>> >
>> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
>> > <[email protected]> wrote:
>> >>
>> >> Hi Kai-Heng,
>> >
>> > [snipped]
>> >
>> >> Thank you for your help. Sorry I did not post the logs previously.
>> >>
>> >> dmesg: https://pastebin.com/tpWDNyQr
>> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
>> >
>> > Seems like btmtksdio needs shudown() to be called before flush().
>> > Since the order was there for a very long time, changing the calling
>> > order indeed can break what driver expects.
>> > Can you please test the following patch:
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index 2560ed2f144d..a61e610a400c 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> > aosp_do_close(hdev);
>> > msft_do_close(hdev);
>> >
>> > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
>> > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
>> > + test_bit(HCI_UP, &hdev->flags)) {
>> > + /* Execute vendor specific shutdown routine */
>> > + if (hdev->shutdown)
>> > + hdev->shutdown(hdev);
>> > + }
>> > +
>> > if (hdev->flush)
>> > hdev->flush(hdev);
>> >
>> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> > clear_bit(HCI_INIT, &hdev->flags);
>> > }
>> >
>> > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
>> > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
>> > - test_bit(HCI_UP, &hdev->flags)) {
>> > - /* Execute vendor specific shutdown routine */
>> > - if (hdev->shutdown)
>> > - hdev->shutdown(hdev);
>> > - }
>> > -
>> > /* flush cmd work */
>> > flush_work(&hdev->cmd_work);
>>
>> Thanks for the patch and your help.
>> I've tried it, but it seems that it does not improve for me.
>> I'm still observing:
>>
>> i500-pumpkin login: root
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Connection timed out (110)
>>
>> Logs for this session:
>> dmesg: https://pastebin.com/iAFk5Tzi
>> ftrace: https://pastebin.com/kEMWSYrE
>
> Thanks for the testing!
> What about moving the shutdown() part right after hci_req_sync_lock()
> so tx/rx can still work:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 2560ed2f144d4..be3113fb7d4b0 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> hci_request_cancel_all(hdev);
> hci_req_sync_lock(hdev);
>
> + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> + test_bit(HCI_UP, &hdev->flags)) {
> + /* Execute vendor specific shutdown routine */
> + if (hdev->shutdown)
> + hdev->shutdown(hdev);
> + }
> +
> if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> cancel_delayed_work_sync(&hdev->cmd_timer);
> hci_req_sync_unlock(hdev);
> @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> clear_bit(HCI_INIT, &hdev->flags);
> }
>
> - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> - test_bit(HCI_UP, &hdev->flags)) {
> - /* Execute vendor specific shutdown routine */
> - if (hdev->shutdown)
> - hdev->shutdown(hdev);
> - }
> -
> /* flush cmd work */
> flush_work(&hdev->cmd_work);
I confirm this diff works for me:

root@i500-pumpkin:~# hciconfig hci0 up
root@i500-pumpkin:~# hciconfig hci0 down
root@i500-pumpkin:~# hciconfig hci0 up
root@i500-pumpkin:~# hciconfig hci0
hci0: Type: Primary Bus: SDIO
BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4
UP RUNNING
RX bytes:11268 acl:0 sco:0 events:829 errors:0
TX bytes:182569 acl:0 sco:0 commands:829 errors:0

root@i500-pumpkin:~# hcitool scan
Scanning ...
<redacted> Pixel 3 XL

Tested-by: Mattijs Korpershoek <[email protected]>
>
>
>
>
>
>>
>>
>> >
>> > Kai-Heng
>> >
>> >>
>> >> Mattijs
>> >> >
>> >> > Kai-Heng
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> Mattijs Korpershoek
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Regards
>> >> >> >
>> >> >> > Marcel

2021-08-06 23:47:18

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Hi Mattijs,

On Fri, Aug 6, 2021 at 4:51 PM Mattijs Korpershoek
<[email protected]> wrote:
>
> Hi Kai-Heng,
>
> Kai-Heng Feng <[email protected]> writes:
>
> > Hi Mattijs,
> >
> > On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek
> > <[email protected]> wrote:
> >>
> >> Hi Kai-Heng,
> >>
> >> Thanks for your patch,
> >>
> >> Kai-Heng Feng <[email protected]> writes:
> >>
> >
> > [snipped]
> >
> >> I confirm this diff works for me:
> >>
> >> root@i500-pumpkin:~# hciconfig hci0 up
> >> root@i500-pumpkin:~# hciconfig hci0 down
> >> root@i500-pumpkin:~# hciconfig hci0 up
> >> root@i500-pumpkin:~# hciconfig hci0
> >> hci0: Type: Primary Bus: SDIO
> >> BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4
> >> UP RUNNING
> >> RX bytes:11268 acl:0 sco:0 events:829 errors:0
> >> TX bytes:182569 acl:0 sco:0 commands:829 errors:0
> >>
> >> root@i500-pumpkin:~# hcitool scan
> >> Scanning ...
> >> <redacted> Pixel 3 XL
> >>
> >> Tested-by: Mattijs Korpershoek <[email protected]>
> >
> > I found that btmtksdio_flush() only cancels the work instead of doing
> > flush_work(). That probably explains why putting ->shutdown right
> > before ->flush doesn't work.
> > So can you please test the following again:
> > diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
> > index 9872ef18f9fea..b33c05ad2150b 100644
> > --- a/drivers/bluetooth/btmtksdio.c
> > +++ b/drivers/bluetooth/btmtksdio.c
> > @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev)
> > {
> > struct btmtksdio_dev *bdev = hci_get_drvdata(hdev);
> >
> > - skb_queue_purge(&bdev->txq);
> > + flush_work(&bdev->tx_work);
> >
> > - cancel_work_sync(&bdev->tx_work);
> > + skb_queue_purge(&bdev->txq);
> >
> > return 0;
> > }
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 2560ed2f144d4..a61e610a400cb 100644
> >
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > aosp_do_close(hdev);
> > msft_do_close(hdev);
> >
> > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > + test_bit(HCI_UP, &hdev->flags)) {
> > + /* Execute vendor specific shutdown routine */
> > + if (hdev->shutdown)
> > + hdev->shutdown(hdev);
> > + }
> > +
> > if (hdev->flush)
> > hdev->flush(hdev);
> >
> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > clear_bit(HCI_INIT, &hdev->flags);
> > }
> >
> > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > - test_bit(HCI_UP, &hdev->flags)) {
> > - /* Execute vendor specific shutdown routine */
> > - if (hdev->shutdown)
> > - hdev->shutdown(hdev);
> > - }
> > -
> > /* flush cmd work */
> > flush_work(&hdev->cmd_work);
> I've tried this but I have the same (broken) symptoms as before.
>
> Here are some logs of v3:
> dmesg: https://pastebin.com/1x4UHkzy
> ftrace: https://pastebin.com/Lm1d6AWy

Thanks for your testing. I think I finally got it:
btmtksdio_shutdown()
-> mtk_hci_wmt_sync()
-> __hci_cmd_send()
then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in
btmtksdio_recv_event():
btmtksdio_recv_event()
-> hci_recv_frame()
-> queue_work(hdev->workqueue, &hdev->rx_work);

That means it has to be done before the following drain_workqueue() call.
Can you please see if moving the ->shutdown() part right before
drain_workqueue() can fix the issue?

Kai-Heng

>
> Mattijs
>
> >
> > Kai-Heng

2021-08-09 09:21:08

by Mattijs Korpershoek

[permalink] [raw]
Subject: Re: [PATCH v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Hi Kai-Heng,

Kai-Heng Feng <[email protected]> writes:

> Hi Mattijs,
>
>> [snipped]
>
> Thanks for your testing. I think I finally got it:
> btmtksdio_shutdown()
> -> mtk_hci_wmt_sync()
> -> __hci_cmd_send()
> then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in
> btmtksdio_recv_event():
> btmtksdio_recv_event()
> -> hci_recv_frame()
> -> queue_work(hdev->workqueue, &hdev->rx_work);
>
> That means it has to be done before the following drain_workqueue() call.
> Can you please see if moving the ->shutdown() part right before
> drain_workqueue() can fix the issue?
I've tested the following patch:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d..131e69a9a66a 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1757,6 +1757,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
cancel_delayed_work_sync(&adv_instance->rpa_expired_cb);
}

+ if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+ !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+ test_bit(HCI_UP, &hdev->flags)) {
+ /* Execute vendor specific shutdown routine */
+ if (hdev->shutdown)
+ hdev->shutdown(hdev);
+ }
+
/* Avoid potential lockdep warnings from the *_flush() calls by
* ensuring the workqueue is empty up front.
*/
@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
clear_bit(HCI_INIT, &hdev->flags);
}

- if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
- !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
- test_bit(HCI_UP, &hdev->flags)) {
- /* Execute vendor specific shutdown routine */
- if (hdev->shutdown)
- hdev->shutdown(hdev);
- }
-
/* flush cmd work */
flush_work(&hdev->cmd_work);

It does not seem to fix the issue.
Adding the bits in btmtksdio_flush() does not change the result of the
above patch.

Here are the logs. These are just with the above patch (not with the
btmtksdio_flush() changes.

dmesg: https://pastebin.com/FZZBkqGC
btmtksdio_ftrace: https://pastebin.com/JQ0UWenY

Mattijs
>
> Kai-Heng
>
>>
>> Mattijs
>>
>> >
>> > Kai-Heng