2022-05-13 14:25:10

by Ahmad Fatoum

[permalink] [raw]
Subject: [BUG] BLE device unpairing triggers kernel panic

Hello,

On Linux v5.18-rc5, I can reliably crash the kernel on the second (un)pairing
with a customer's BLE device. I have bisected the issue and found two commits:

- Commit 6cd29ec6ae5e ("Bluetooth: hci_sync: Wait for proper events when
connecting LE") causes previously working pairing to time out, presumably
because it keeps waiting for the wrong event.

- Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
fixes, despite the title, what event is waited on. First Pairing works now,
but the second pairing times out and crashes the kernel:

[ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
[ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
[ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8

KASAN tells me that the problem is that the sync workqueue runs with a freed hci_conn:

[ 94.155409] ==================================================================
[ 94.162658] BUG: KASAN: use-after-free in hci_le_conn_failed+0x38/0x224
[ 94.162969] imx-sdma 302c0000.dma-controller: restart cyclic channel 1
[ 94.169316] Read of size 8 at addr ffff00000c88d568 by task kworker/u9:0/70
[ 94.182811]
[ 94.184319] CPU: 3 PID: 70 Comm: kworker/u9:0 Tainted: G O 5.18.0-rc5 #1
[ 94.192869] Hardware name: i.MX8MM custom board
[ 94.197676] Workqueue: hci0 hci_cmd_sync_work
[ 94.202072] Call trace:
[ 94.204530] dump_backtrace+0x198/0x200
[ 94.208394] show_stack+0x1c/0x30
[ 94.211729] dump_stack_lvl+0x64/0x7c
[ 94.215415] print_report+0x158/0x5b0
[ 94.219107] kasan_report+0xac/0xf0
[ 94.222619] __asan_load8+0x8c/0xc4
[ 94.226131] hci_le_conn_failed+0x38/0x224
[ 94.230255] create_le_conn_complete+0x60/0xac
[ 94.234726] hci_cmd_sync_work+0x13c/0x16c
[ 94.238851] process_one_work+0x418/0x7f0
[ 94.242884] worker_thread+0x328/0x7b0
[ 94.246655] kthread+0x138/0x140
[ 94.249913] ret_from_fork+0x10/0x20
[ 94.253512]
[ 94.255011] Allocated by task 0:
[ 94.258251] (stack is not available)
[ 94.261833]
[ 94.263332] Freed by task 365:
[ 94.266400] kasan_save_stack+0x28/0x50
[ 94.270261] kasan_set_track+0x28/0x40
[ 94.274032] kasan_set_free_info+0x24/0x44
[ 94.278168] __kasan_slab_free+0xe8/0x130
[ 94.282203] slab_free_freelist_hook+0x6c/0x220
[ 94.286758] kfree+0xd0/0x2c0
[ 94.289743] bt_link_release+0x18/0x20
[ 94.293517] device_release+0x94/0x10c
[ 94.297294] kobject_put+0x110/0x290
[ 94.300899] put_device+0x18/0x24
[ 94.304238] hci_conn_cleanup+0x1d0/0x1e0
[ 94.308272] hci_conn_del+0x144/0x158
[ 94.311955] hci_conn_failed+0xc0/0xd0
[ 94.315727] le_conn_complete_evt+0x4f4/0x760
[ 94.320107] hci_le_enh_conn_complete_evt+0xe8/0x104
[ 94.325094] hci_le_meta_evt+0x1bc/0x1d8
[ 94.329047] hci_event_packet+0x4e8/0x600
[ 94.333086] hci_rx_work+0x1dc/0x3ac
[ 94.336683] process_one_work+0x418/0x7f0
[ 94.340719] worker_thread+0x328/0x7b0
[ 94.344494] kthread+0x138/0x140
[ 94.347753] ret_from_fork+0x10/0x20
[ 94.351358]
[ 94.352855] The buggy address belongs to the object at ffff00000c88d000
[ 94.352855] which belongs to the cache kmalloc-2k of size 2048
[ 94.365395] The buggy address is located 1384 bytes inside of
[ 94.365395] 2048-byte region [ffff00000c88d000, ffff00000c88d800)
[ 94.377329]
[ 94.378832] The buggy address belongs to the physical page:
[ 94.384421] page:000000009fb0e042 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4c888
[ 94.393754] head:000000009fb0e042 order:3 compound_mapcount:0 compound_pincount:0
[ 94.401256] flags: 0x10200(slab|head|node=0|zone=0)
[ 94.406168] raw: 0000000000010200 dead000000000100 dead000000000122 ffff000000002900
[ 94.413934] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000
[ 94.421697] page dumped because: kasan: bad access detected
[ 94.427285]
[ 94.428783] Memory state around the buggy address:
[ 94.433590] ffff00000c88d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 94.440835] ffff00000c88d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 94.448078] >ffff00000c88d500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 94.455316] ^
[ 94.461949] ffff00000c88d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 94.469196] ffff00000c88d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 94.476436] ==================================================================


As a workaround I am reverting 6cd29ec6ae5e for now, but this doesn't fix
the underlying crash...

I have not been successful in figuring out where the HCI reference counting
goes wrong. Would be great if someone with more familiarity with the code
could look into it. Please let me know what more information may be needed.

Cheers,
Ahmad

---
#regzb introduced: a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


2022-05-14 04:05:58

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Ahmad,

On Fri, May 13, 2022 at 7:10 AM Ahmad Fatoum <[email protected]> wrote:
>
> Hello,
>
> On Linux v5.18-rc5, I can reliably crash the kernel on the second (un)pairing
> with a customer's BLE device. I have bisected the issue and found two commits:
>
> - Commit 6cd29ec6ae5e ("Bluetooth: hci_sync: Wait for proper events when
> connecting LE") causes previously working pairing to time out, presumably
> because it keeps waiting for the wrong event.

Can you describe in more details what is the second pairing, are you
pairing 2 devices concurrently? I recall someone for nxp having
similar problem, at least the traces look pretty similar, the problem
seems to be the expected event don't match the event the controller
send, in this case hci_le_enh_conn_complete_evt, so hci_event process
it and frees the hci_conn instead of first running the callback.

> - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
> fixes, despite the title, what event is waited on. First Pairing works now,
> but the second pairing times out and crashes the kernel:
>
> [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
> [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
> [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
>
> KASAN tells me that the problem is that the sync workqueue runs with a freed hci_conn:
>
> [ 94.155409] ==================================================================
> [ 94.162658] BUG: KASAN: use-after-free in hci_le_conn_failed+0x38/0x224
> [ 94.162969] imx-sdma 302c0000.dma-controller: restart cyclic channel 1
> [ 94.169316] Read of size 8 at addr ffff00000c88d568 by task kworker/u9:0/70
> [ 94.182811]
> [ 94.184319] CPU: 3 PID: 70 Comm: kworker/u9:0 Tainted: G O 5.18.0-rc5 #1
> [ 94.192869] Hardware name: i.MX8MM custom board
> [ 94.197676] Workqueue: hci0 hci_cmd_sync_work
> [ 94.202072] Call trace:
> [ 94.204530] dump_backtrace+0x198/0x200
> [ 94.208394] show_stack+0x1c/0x30
> [ 94.211729] dump_stack_lvl+0x64/0x7c
> [ 94.215415] print_report+0x158/0x5b0
> [ 94.219107] kasan_report+0xac/0xf0
> [ 94.222619] __asan_load8+0x8c/0xc4
> [ 94.226131] hci_le_conn_failed+0x38/0x224
> [ 94.230255] create_le_conn_complete+0x60/0xac
> [ 94.234726] hci_cmd_sync_work+0x13c/0x16c
> [ 94.238851] process_one_work+0x418/0x7f0
> [ 94.242884] worker_thread+0x328/0x7b0
> [ 94.246655] kthread+0x138/0x140
> [ 94.249913] ret_from_fork+0x10/0x20
> [ 94.253512]
> [ 94.255011] Allocated by task 0:
> [ 94.258251] (stack is not available)
> [ 94.261833]
> [ 94.263332] Freed by task 365:
> [ 94.266400] kasan_save_stack+0x28/0x50
> [ 94.270261] kasan_set_track+0x28/0x40
> [ 94.274032] kasan_set_free_info+0x24/0x44
> [ 94.278168] __kasan_slab_free+0xe8/0x130
> [ 94.282203] slab_free_freelist_hook+0x6c/0x220
> [ 94.286758] kfree+0xd0/0x2c0
> [ 94.289743] bt_link_release+0x18/0x20
> [ 94.293517] device_release+0x94/0x10c
> [ 94.297294] kobject_put+0x110/0x290
> [ 94.300899] put_device+0x18/0x24
> [ 94.304238] hci_conn_cleanup+0x1d0/0x1e0
> [ 94.308272] hci_conn_del+0x144/0x158
> [ 94.311955] hci_conn_failed+0xc0/0xd0
> [ 94.315727] le_conn_complete_evt+0x4f4/0x760
> [ 94.320107] hci_le_enh_conn_complete_evt+0xe8/0x104
> [ 94.325094] hci_le_meta_evt+0x1bc/0x1d8
> [ 94.329047] hci_event_packet+0x4e8/0x600
> [ 94.333086] hci_rx_work+0x1dc/0x3ac
> [ 94.336683] process_one_work+0x418/0x7f0
> [ 94.340719] worker_thread+0x328/0x7b0
> [ 94.344494] kthread+0x138/0x140
> [ 94.347753] ret_from_fork+0x10/0x20
> [ 94.351358]
> [ 94.352855] The buggy address belongs to the object at ffff00000c88d000
> [ 94.352855] which belongs to the cache kmalloc-2k of size 2048
> [ 94.365395] The buggy address is located 1384 bytes inside of
> [ 94.365395] 2048-byte region [ffff00000c88d000, ffff00000c88d800)
> [ 94.377329]
> [ 94.378832] The buggy address belongs to the physical page:
> [ 94.384421] page:000000009fb0e042 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4c888
> [ 94.393754] head:000000009fb0e042 order:3 compound_mapcount:0 compound_pincount:0
> [ 94.401256] flags: 0x10200(slab|head|node=0|zone=0)
> [ 94.406168] raw: 0000000000010200 dead000000000100 dead000000000122 ffff000000002900
> [ 94.413934] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000
> [ 94.421697] page dumped because: kasan: bad access detected
> [ 94.427285]
> [ 94.428783] Memory state around the buggy address:
> [ 94.433590] ffff00000c88d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 94.440835] ffff00000c88d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 94.448078] >ffff00000c88d500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 94.455316] ^
> [ 94.461949] ffff00000c88d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 94.469196] ffff00000c88d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [ 94.476436] ==================================================================
>
>
> As a workaround I am reverting 6cd29ec6ae5e for now, but this doesn't fix
> the underlying crash...
>
> I have not been successful in figuring out where the HCI reference counting
> goes wrong. Would be great if someone with more familiarity with the code
> could look into it. Please let me know what more information may be needed.
>
> Cheers,
> Ahmad
>
> ---
> #regzb introduced: a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |



--
Luiz Augusto von Dentz

2022-05-14 04:25:15

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Tedd,

On Fri, May 13, 2022 at 4:52 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Ahmad,
>
> On Fri, May 13, 2022 at 1:14 PM Luiz Augusto von Dentz
> <[email protected]> wrote:
> >
> > Hi Ahmad,
> >
> > On Fri, May 13, 2022 at 7:10 AM Ahmad Fatoum <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > On Linux v5.18-rc5, I can reliably crash the kernel on the second (un)pairing
> > > with a customer's BLE device. I have bisected the issue and found two commits:
> > >
> > > - Commit 6cd29ec6ae5e ("Bluetooth: hci_sync: Wait for proper events when
> > > connecting LE") causes previously working pairing to time out, presumably
> > > because it keeps waiting for the wrong event.
> >
> > Can you describe in more details what is the second pairing, are you
> > pairing 2 devices concurrently? I recall someone for nxp having
> > similar problem, at least the traces look pretty similar, the problem
> > seems to be the expected event don't match the event the controller
> > send, in this case hci_le_enh_conn_complete_evt, so hci_event process
> > it and frees the hci_conn instead of first running the callback.
>
> Looks like my memory failed me on this one, the sync callback is run
> last so we shouldn't cleanup the hci_conn at that point, perhaps
> something like the following should fix the crash:
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 0270e597c285..c1634af670b8 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -5632,10 +5632,8 @@ static void le_conn_complete_evt(struct hci_dev
> *hdev, u8 status,
> status = HCI_ERROR_INVALID_PARAMETERS;
> }
>
> - if (status) {
> - hci_conn_failed(conn, status);
> + if (status)
> goto unlock;
> - }
>
> if (conn->dst_type == ADDR_LE_DEV_PUBLIC)
> addr_type = BDADDR_LE_PUBLIC;
>
> > > - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
> > > fixes, despite the title, what event is waited on. First Pairing works now,
> > > but the second pairing times out and crashes the kernel:
> > >
> > > [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
> > > [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
> > > [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
>
> That said the error -110 mean -ETIMEDOUT

We might want to incorporate some test to the likes of mgmt-tester to
check this behavior, afaik the crash can be triggered by just causing
a le_conn_complete event with status != 0 which perhaps we need to
extend bthost.c to be able to reject connections.

>
>
> --
> Luiz Augusto von Dentz



--
Luiz Augusto von Dentz

2022-05-14 04:26:50

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Ahmad,

On Fri, May 13, 2022 at 1:14 PM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Ahmad,
>
> On Fri, May 13, 2022 at 7:10 AM Ahmad Fatoum <[email protected]> wrote:
> >
> > Hello,
> >
> > On Linux v5.18-rc5, I can reliably crash the kernel on the second (un)pairing
> > with a customer's BLE device. I have bisected the issue and found two commits:
> >
> > - Commit 6cd29ec6ae5e ("Bluetooth: hci_sync: Wait for proper events when
> > connecting LE") causes previously working pairing to time out, presumably
> > because it keeps waiting for the wrong event.
>
> Can you describe in more details what is the second pairing, are you
> pairing 2 devices concurrently? I recall someone for nxp having
> similar problem, at least the traces look pretty similar, the problem
> seems to be the expected event don't match the event the controller
> send, in this case hci_le_enh_conn_complete_evt, so hci_event process
> it and frees the hci_conn instead of first running the callback.

Looks like my memory failed me on this one, the sync callback is run
last so we shouldn't cleanup the hci_conn at that point, perhaps
something like the following should fix the crash:

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 0270e597c285..c1634af670b8 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -5632,10 +5632,8 @@ static void le_conn_complete_evt(struct hci_dev
*hdev, u8 status,
status = HCI_ERROR_INVALID_PARAMETERS;
}

- if (status) {
- hci_conn_failed(conn, status);
+ if (status)
goto unlock;
- }

if (conn->dst_type == ADDR_LE_DEV_PUBLIC)
addr_type = BDADDR_LE_PUBLIC;

> > - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
> > fixes, despite the title, what event is waited on. First Pairing works now,
> > but the second pairing times out and crashes the kernel:
> >
> > [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
> > [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
> > [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8

That said the error -110 mean -ETIMEDOUT



--
Luiz Augusto von Dentz

2022-05-16 16:54:29

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hello Luiz,

On 14.05.22 01:52, Luiz Augusto von Dentz wrote:
> On Fri, May 13, 2022 at 1:14 PM Luiz Augusto von Dentz
> <[email protected]> wrote:

Thanks for the quick reply.

>> Hi Ahmad,
>>
>> On Fri, May 13, 2022 at 7:10 AM Ahmad Fatoum <[email protected]> wrote:
>>>
>>> Hello,
>>>
>>> On Linux v5.18-rc5, I can reliably crash the kernel on the second (un)pairing
>>> with a customer's BLE device. I have bisected the issue and found two commits:
>>>
>>> - Commit 6cd29ec6ae5e ("Bluetooth: hci_sync: Wait for proper events when
>>> connecting LE") causes previously working pairing to time out, presumably
>>> because it keeps waiting for the wrong event.
>>
>> Can you describe in more details what is the second pairing, are you
>> pairing 2 devices concurrently? I recall someone for nxp having
>> similar problem, at least the traces look pretty similar, the problem
>> seems to be the expected event don't match the event the controller
>> send, in this case hci_le_enh_conn_complete_evt, so hci_event process
>> it and frees the hci_conn instead of first running the callback.

It's the same device. I set the host to pairable, then have the device
pair with the host. Then I unpair on the device and then redo the
same operation again. First one works. Second one fails triggering
the crash.

> Looks like my memory failed me on this one, the sync callback is run
> last so we shouldn't cleanup the hci_conn at that point, perhaps
> something like the following should fix the crash:
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 0270e597c285..c1634af670b8 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -5632,10 +5632,8 @@ static void le_conn_complete_evt(struct hci_dev
> *hdev, u8 status,
> status = HCI_ERROR_INVALID_PARAMETERS;
> }
>
> - if (status) {
> - hci_conn_failed(conn, status);
> + if (status)
> goto unlock;
> - }

Yes, this fixes the crash for me. Can you send a patch to that effect?
Feel free to add:

Tested-by: Ahmad Fatoum <[email protected]>

> if (conn->dst_type == ADDR_LE_DEV_PUBLIC)
> addr_type = BDADDR_LE_PUBLIC;
>
>>> - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
>>> fixes, despite the title, what event is waited on. First Pairing works now,
>>> but the second pairing times out and crashes the kernel:
>>>
>>> [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
>>> [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
>>> [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
>
> That said the error -110 mean -ETIMEDOUT

Yes, this issue remains still. I feel better about my revert
knowing that the crash is fixed, but I'd like this regression
here fixed upstream as well. I'll try to collect some more
information and report back.

Cheers,
Ahmad

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-06-16 10:39:49

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hello Luiz,

On 16.05.22 18:37, Ahmad Fatoum wrote:
>>>> - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
>>>> fixes, despite the title, what event is waited on. First Pairing works now,
>>>> but the second pairing times out and crashes the kernel:
>>>>
>>>> [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
>>>> [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
>>>> [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
>>
>> That said the error -110 mean -ETIMEDOUT
>
> Yes, this issue remains still. I feel better about my revert
> knowing that the crash is fixed, but I'd like this regression
> here fixed upstream as well. I'll try to collect some more
> information and report back.

I've now found time to revisit this and sprinkle around some
extra logging. This is the initial pairing that works:

Bluetooth: entered hci_le_create_conn_sync()
Bluetooth: hci0: opcode 0x200d plen 25
Bluetooth: hci0: event 0x0f (sent = 0x0a)
Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
Bluetooth: hci0: event 0x3e (sent = 0x0a)
Bluetooth: hci0: BT: subevent 0x0a (sent 0x0a)
Bluetooth: entered hci_le_meta_evt(event=0x0a) completion clause

I unpaired on device side and then retried pairing:

Bluetooth: entered hci_le_create_conn_sync()
Bluetooth: hci0: opcode 0x200d plen 25
Bluetooth: hci0: event 0x0f (sent = 0x0a)
Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
Bluetooth: entered hci_abort_conn()
Bluetooth: hci0: opcode hci_req_add_ev 0x200e
Bluetooth: hci0: event 0x0e (sent = 0x00)
Bluetooth: hci0: event 0x3e (sent = 0x00)
Bluetooth: hci0: BT: subevent 0x0a (sent 0x00)
Bluetooth: __hci_cmd_sync_sk pending (event = 0x0a status=1, err=-110)
Bluetooth: hci0: Opcode 0x200d failed: -110
Bluetooth: hci0: opcode 0x2006 plen 15
Bluetooth: hci0: event 0x0e (sent = 0x00)
Bluetooth: hci0: opcode 0x200a plen 1
Bluetooth: hci0: event 0x0e (sent = 0x00)
Bluetooth: hci0: request failed to create LE connection: err -110


But now it times out as reported. It looks like the
intermittent hci_abort_conn() is at fault here. My theory is
that replacing hci->sent_cmd is the problem here, as other
events can't be matched anymore.

We've been deploying the revert for a while now and I just posted
it to the mailing list[1]. There have been other reports
of this issue with different hardware too and fixing sent_cmd
would likely be too complicated/time intensive for me.

I am happy to test future patches that fix this properly though.

[1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t

Cheers,
Ahmad



>
> Cheers,
> Ahmad
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-06-17 21:06:55

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Ahmad,

On Thu, Jun 16, 2022 at 3:38 AM Ahmad Fatoum <[email protected]> wrote:
>
> Hello Luiz,
>
> On 16.05.22 18:37, Ahmad Fatoum wrote:
> >>>> - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
> >>>> fixes, despite the title, what event is waited on. First Pairing works now,
> >>>> but the second pairing times out and crashes the kernel:
> >>>>
> >>>> [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
> >>>> [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
> >>>> [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
> >>
> >> That said the error -110 mean -ETIMEDOUT
> >
> > Yes, this issue remains still. I feel better about my revert
> > knowing that the crash is fixed, but I'd like this regression
> > here fixed upstream as well. I'll try to collect some more
> > information and report back.
>
> I've now found time to revisit this and sprinkle around some
> extra logging. This is the initial pairing that works:
>
> Bluetooth: entered hci_le_create_conn_sync()
> Bluetooth: hci0: opcode 0x200d plen 25
> Bluetooth: hci0: event 0x0f (sent = 0x0a)
> Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
> Bluetooth: hci0: event 0x3e (sent = 0x0a)
> Bluetooth: hci0: BT: subevent 0x0a (sent 0x0a)
> Bluetooth: entered hci_le_meta_evt(event=0x0a) completion clause
>
> I unpaired on device side and then retried pairing:
>
> Bluetooth: entered hci_le_create_conn_sync()
> Bluetooth: hci0: opcode 0x200d plen 25
> Bluetooth: hci0: event 0x0f (sent = 0x0a)
> Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
> Bluetooth: entered hci_abort_conn()
> Bluetooth: hci0: opcode hci_req_add_ev 0x200e
> Bluetooth: hci0: event 0x0e (sent = 0x00)
> Bluetooth: hci0: event 0x3e (sent = 0x00)
> Bluetooth: hci0: BT: subevent 0x0a (sent 0x00)
> Bluetooth: __hci_cmd_sync_sk pending (event = 0x0a status=1, err=-110)
> Bluetooth: hci0: Opcode 0x200d failed: -110
> Bluetooth: hci0: opcode 0x2006 plen 15
> Bluetooth: hci0: event 0x0e (sent = 0x00)
> Bluetooth: hci0: opcode 0x200a plen 1
> Bluetooth: hci0: event 0x0e (sent = 0x00)
> Bluetooth: hci0: request failed to create LE connection: err -110
>
>
> But now it times out as reported. It looks like the
> intermittent hci_abort_conn() is at fault here. My theory is
> that replacing hci->sent_cmd is the problem here, as other
> events can't be matched anymore.

Yep, unpair command uses hci_abort_conn when it should really be using
hci_abort_conn_sync, the problem is if we do that then it probably no
longer work because it would have to wait for sync queue to complete
so it would only be able to disconnect after the connect command
completes, well perhaps that is acceptable otherwise we need a
different queue to handle command that abort/cancel other already in
the queue.

> We've been deploying the revert for a while now and I just posted
> it to the mailing list[1]. There have been other reports
> of this issue with different hardware too and fixing sent_cmd
> would likely be too complicated/time intensive for me.
>
> I am happy to test future patches that fix this properly though.
>
> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
>
> Cheers,
> Ahmad
>
>
>
> >
> > Cheers,
> > Ahmad
> >
>
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |



--
Luiz Augusto von Dentz

2022-06-20 10:09:11

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Luiz,

On 17.06.22 22:48, Luiz Augusto von Dentz wrote:
> On Thu, Jun 16, 2022 at 3:38 AM Ahmad Fatoum <[email protected]> wrote:
>> On 16.05.22 18:37, Ahmad Fatoum wrote:
>>>>>> - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
>>>>>> fixes, despite the title, what event is waited on. First Pairing works now,
>>>>>> but the second pairing times out and crashes the kernel:
>>>>>>
>>>>>> [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
>>>>>> [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
>>>>>> [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
>>>>
>>>> That said the error -110 mean -ETIMEDOUT
>>>
>>> Yes, this issue remains still. I feel better about my revert
>>> knowing that the crash is fixed, but I'd like this regression
>>> here fixed upstream as well. I'll try to collect some more
>>> information and report back.
>>
>> I've now found time to revisit this and sprinkle around some
>> extra logging. This is the initial pairing that works:
>>
>> Bluetooth: entered hci_le_create_conn_sync()
>> Bluetooth: hci0: opcode 0x200d plen 25
>> Bluetooth: hci0: event 0x0f (sent = 0x0a)
>> Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
>> Bluetooth: hci0: event 0x3e (sent = 0x0a)
>> Bluetooth: hci0: BT: subevent 0x0a (sent 0x0a)
>> Bluetooth: entered hci_le_meta_evt(event=0x0a) completion clause
>>
>> I unpaired on device side and then retried pairing:
>>
>> Bluetooth: entered hci_le_create_conn_sync()
>> Bluetooth: hci0: opcode 0x200d plen 25
>> Bluetooth: hci0: event 0x0f (sent = 0x0a)
>> Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
>> Bluetooth: entered hci_abort_conn()
>> Bluetooth: hci0: opcode hci_req_add_ev 0x200e
>> Bluetooth: hci0: event 0x0e (sent = 0x00)
>> Bluetooth: hci0: event 0x3e (sent = 0x00)
>> Bluetooth: hci0: BT: subevent 0x0a (sent 0x00)
>> Bluetooth: __hci_cmd_sync_sk pending (event = 0x0a status=1, err=-110)
>> Bluetooth: hci0: Opcode 0x200d failed: -110
>> Bluetooth: hci0: opcode 0x2006 plen 15
>> Bluetooth: hci0: event 0x0e (sent = 0x00)
>> Bluetooth: hci0: opcode 0x200a plen 1
>> Bluetooth: hci0: event 0x0e (sent = 0x00)
>> Bluetooth: hci0: request failed to create LE connection: err -110
>>
>>
>> But now it times out as reported. It looks like the
>> intermittent hci_abort_conn() is at fault here. My theory is
>> that replacing hci->sent_cmd is the problem here, as other
>> events can't be matched anymore.
>
> Yep, unpair command uses hci_abort_conn when it should really be using
> hci_abort_conn_sync, the problem is if we do that then it probably no
> longer work because it would have to wait for sync queue to complete
> so it would only be able to disconnect after the connect command
> completes, well perhaps that is acceptable

Disconnect of connection #1 being processed after new connection #2
concluded sounds wrong. Would I be able to reconnect
afterwards or would all connections, but the first, be directly
disconnected...?

> otherwise we need a
> different queue to handle command that abort/cancel other already in
> the queue.

Is the revert an acceptable interim solution or are there issues
I am missing?

Cheers,
Ahmad

>
>> We've been deploying the revert for a while now and I just posted
>> it to the mailing list[1]. There have been other reports
>> of this issue with different hardware too and fixing sent_cmd
>> would likely be too complicated/time intensive for me.
>>
>> I am happy to test future patches that fix this properly though.
>>
>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
>>
>> Cheers,
>> Ahmad
>>
>>
>>
>>>
>>> Cheers,
>>> Ahmad
>>>
>>
>>
>> --
>> Pengutronix e.K. | |
>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>
>
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-06-20 20:46:40

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Ahmad,

On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
>
> Hi Luiz,
>
> On 17.06.22 22:48, Luiz Augusto von Dentz wrote:
> > On Thu, Jun 16, 2022 at 3:38 AM Ahmad Fatoum <[email protected]> wrote:
> >> On 16.05.22 18:37, Ahmad Fatoum wrote:
> >>>>>> - Commit a56a1138cbd8 ("Bluetooth: hci_sync: Fix not using conn_timeout")
> >>>>>> fixes, despite the title, what event is waited on. First Pairing works now,
> >>>>>> but the second pairing times out and crashes the kernel:
> >>>>>>
> >>>>>> [ 84.191684] Bluetooth: hci0: Opcode 0x200d failed: -110
> >>>>>> [ 84.230478] Bluetooth: hci0: request failed to create LE connection: err -110
> >>>>>> [ 84.237690] Unable to handle kernel read from unreadable memory at virtual address 0000000000000ca8
> >>>>
> >>>> That said the error -110 mean -ETIMEDOUT
> >>>
> >>> Yes, this issue remains still. I feel better about my revert
> >>> knowing that the crash is fixed, but I'd like this regression
> >>> here fixed upstream as well. I'll try to collect some more
> >>> information and report back.
> >>
> >> I've now found time to revisit this and sprinkle around some
> >> extra logging. This is the initial pairing that works:
> >>
> >> Bluetooth: entered hci_le_create_conn_sync()
> >> Bluetooth: hci0: opcode 0x200d plen 25
> >> Bluetooth: hci0: event 0x0f (sent = 0x0a)
> >> Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
> >> Bluetooth: hci0: event 0x3e (sent = 0x0a)
> >> Bluetooth: hci0: BT: subevent 0x0a (sent 0x0a)
> >> Bluetooth: entered hci_le_meta_evt(event=0x0a) completion clause
> >>
> >> I unpaired on device side and then retried pairing:
> >>
> >> Bluetooth: entered hci_le_create_conn_sync()
> >> Bluetooth: hci0: opcode 0x200d plen 25
> >> Bluetooth: hci0: event 0x0f (sent = 0x0a)
> >> Bluetooth: hci0: BT: opcode 0x200d (sent: 0x0a)
> >> Bluetooth: entered hci_abort_conn()
> >> Bluetooth: hci0: opcode hci_req_add_ev 0x200e
> >> Bluetooth: hci0: event 0x0e (sent = 0x00)
> >> Bluetooth: hci0: event 0x3e (sent = 0x00)
> >> Bluetooth: hci0: BT: subevent 0x0a (sent 0x00)
> >> Bluetooth: __hci_cmd_sync_sk pending (event = 0x0a status=1, err=-110)
> >> Bluetooth: hci0: Opcode 0x200d failed: -110
> >> Bluetooth: hci0: opcode 0x2006 plen 15
> >> Bluetooth: hci0: event 0x0e (sent = 0x00)
> >> Bluetooth: hci0: opcode 0x200a plen 1
> >> Bluetooth: hci0: event 0x0e (sent = 0x00)
> >> Bluetooth: hci0: request failed to create LE connection: err -110
> >>
> >>
> >> But now it times out as reported. It looks like the
> >> intermittent hci_abort_conn() is at fault here. My theory is
> >> that replacing hci->sent_cmd is the problem here, as other
> >> events can't be matched anymore.
> >
> > Yep, unpair command uses hci_abort_conn when it should really be using
> > hci_abort_conn_sync, the problem is if we do that then it probably no
> > longer work because it would have to wait for sync queue to complete
> > so it would only be able to disconnect after the connect command
> > completes, well perhaps that is acceptable
>
> Disconnect of connection #1 being processed after new connection #2
> concluded sounds wrong. Would I be able to reconnect
> afterwards or would all connections, but the first, be directly
> disconnected...?

That depends on the order you have queued the commands, it will be
processed in the exact order it is received, that why I said it is
single queue design, and it is done like that to prevent messing up
with states since we know the exact order the commands will be sent.

> > otherwise we need a
> > different queue to handle command that abort/cancel other already in
> > the queue.
>
> Is the revert an acceptable interim solution or are there issues
> I am missing?

Afaik there were problem with concurrent connections request, so what
would really help us here is to have some tests to emulate this
scenario with our CI, in the meantime please check if the following
fixes your problem:

https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae

> Cheers,
> Ahmad
>
> >
> >> We've been deploying the revert for a while now and I just posted
> >> it to the mailing list[1]. There have been other reports
> >> of this issue with different hardware too and fixing sent_cmd
> >> would likely be too complicated/time intensive for me.
> >>
> >> I am happy to test future patches that fix this properly though.
> >>
> >> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
> >>
> >> Cheers,
> >> Ahmad
> >>
> >>
> >>
> >>>
> >>> Cheers,
> >>> Ahmad
> >>>
> >>
> >>
> >> --
> >> Pengutronix e.K. | |
> >> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> >> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> >> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >
> >
> >
>
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |



--
Luiz Augusto von Dentz

2022-06-21 08:43:55

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hello Luiz,

On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
> On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
>> Disconnect of connection #1 being processed after new connection #2
>> concluded sounds wrong. Would I be able to reconnect
>> afterwards or would all connections, but the first, be directly
>> disconnected...?
>
> That depends on the order you have queued the commands, it will be
> processed in the exact order it is received, that why I said it is
> single queue design, and it is done like that to prevent messing up
> with states since we know the exact order the commands will be sent.
>
>>> otherwise we need a
>>> different queue to handle command that abort/cancel other already in
>>> the queue.
>>
>> Is the revert an acceptable interim solution or are there issues
>> I am missing?
>
> Afaik there were problem with concurrent connections request, so what
> would really help us here is to have some tests to emulate this
> scenario with our CI, in the meantime please check if the following
> fixes your problem:
>
> https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae

Doesn't help unfortunately. First pairing works as before.
Second still fails:

Bluetooth: hci0: Opcode 0x200d failed: -110
Bluetooth: hci0: request failed to create LE connection: err -110

Cheers,
Ahmad

>
>> Cheers,
>> Ahmad
>>
>>>
>>>> We've been deploying the revert for a while now and I just posted
>>>> it to the mailing list[1]. There have been other reports
>>>> of this issue with different hardware too and fixing sent_cmd
>>>> would likely be too complicated/time intensive for me.
>>>>
>>>> I am happy to test future patches that fix this properly though.
>>>>
>>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
>>>>
>>>> Cheers,
>>>> Ahmad
>>>>
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>> Ahmad
>>>>>
>>>>
>>>>
>>>> --
>>>> Pengutronix e.K. | |
>>>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>>>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>>>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>>>
>>>
>>>
>>
>>
>> --
>> Pengutronix e.K. | |
>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>
>
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-06-21 18:54:01

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Ahmad, Tedd,

On Tue, Jun 21, 2022 at 1:32 AM Ahmad Fatoum <[email protected]> wrote:
>
> Hello Luiz,
>
> On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
> > On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
> >> Disconnect of connection #1 being processed after new connection #2
> >> concluded sounds wrong. Would I be able to reconnect
> >> afterwards or would all connections, but the first, be directly
> >> disconnected...?
> >
> > That depends on the order you have queued the commands, it will be
> > processed in the exact order it is received, that why I said it is
> > single queue design, and it is done like that to prevent messing up
> > with states since we know the exact order the commands will be sent.
> >
> >>> otherwise we need a
> >>> different queue to handle command that abort/cancel other already in
> >>> the queue.
> >>
> >> Is the revert an acceptable interim solution or are there issues
> >> I am missing?
> >
> > Afaik there were problem with concurrent connections request, so what
> > would really help us here is to have some tests to emulate this
> > scenario with our CI, in the meantime please check if the following
> > fixes your problem:
> >
> > https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae
>
> Doesn't help unfortunately. First pairing works as before.
> Second still fails:
>
> Bluetooth: hci0: Opcode 0x200d failed: -110
> Bluetooth: hci0: request failed to create LE connection: err -110

Can we try to add a test in mgmt-tester to reproduce the error above?

> Cheers,
> Ahmad
>
> >
> >> Cheers,
> >> Ahmad
> >>
> >>>
> >>>> We've been deploying the revert for a while now and I just posted
> >>>> it to the mailing list[1]. There have been other reports
> >>>> of this issue with different hardware too and fixing sent_cmd
> >>>> would likely be too complicated/time intensive for me.
> >>>>
> >>>> I am happy to test future patches that fix this properly though.
> >>>>
> >>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
> >>>>
> >>>> Cheers,
> >>>> Ahmad
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> Cheers,
> >>>>> Ahmad
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Pengutronix e.K. | |
> >>>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> >>>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> >>>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Pengutronix e.K. | |
> >> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> >> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> >> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >
> >
> >
>
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |



--
Luiz Augusto von Dentz

2022-06-24 12:56:15

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Tedd, Luiz,

On 21.06.22 20:52, Luiz Augusto von Dentz wrote:
> Hi Ahmad, Tedd,
>
> On Tue, Jun 21, 2022 at 1:32 AM Ahmad Fatoum <[email protected]> wrote:
>>
>> Hello Luiz,
>>
>> On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
>>> On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
>>>> Disconnect of connection #1 being processed after new connection #2
>>>> concluded sounds wrong. Would I be able to reconnect
>>>> afterwards or would all connections, but the first, be directly
>>>> disconnected...?
>>>
>>> That depends on the order you have queued the commands, it will be
>>> processed in the exact order it is received, that why I said it is
>>> single queue design, and it is done like that to prevent messing up
>>> with states since we know the exact order the commands will be sent.
>>>
>>>>> otherwise we need a
>>>>> different queue to handle command that abort/cancel other already in
>>>>> the queue.
>>>>
>>>> Is the revert an acceptable interim solution or are there issues
>>>> I am missing?
>>>
>>> Afaik there were problem with concurrent connections request, so what
>>> would really help us here is to have some tests to emulate this
>>> scenario with our CI, in the meantime please check if the following
>>> fixes your problem:
>>>
>>> https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae
>>
>> Doesn't help unfortunately. First pairing works as before.
>> Second still fails:
>>
>> Bluetooth: hci0: Opcode 0x200d failed: -110
>> Bluetooth: hci0: request failed to create LE connection: err -110
>
> Can we try to add a test in mgmt-tester to reproduce the error above?

I am not familiar with mgmt-tester. What information do you
need to reproduce? In the meantime, can we revert the commit?
I understand that this may break other uses, but I believe
previously working stuff should have precedence..

Cheers,
Ahmad

>
>> Cheers,
>> Ahmad
>>
>>>
>>>> Cheers,
>>>> Ahmad
>>>>
>>>>>
>>>>>> We've been deploying the revert for a while now and I just posted
>>>>>> it to the mailing list[1]. There have been other reports
>>>>>> of this issue with different hardware too and fixing sent_cmd
>>>>>> would likely be too complicated/time intensive for me.
>>>>>>
>>>>>> I am happy to test future patches that fix this properly though.
>>>>>>
>>>>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
>>>>>>
>>>>>> Cheers,
>>>>>> Ahmad
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Ahmad
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pengutronix e.K. | |
>>>>>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>>>>>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>>>>>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pengutronix e.K. | |
>>>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>>>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>>>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>>>
>>>
>>>
>>
>>
>> --
>> Pengutronix e.K. | |
>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>
>
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-06-24 20:14:11

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi Ahmad,

On Fri, Jun 24, 2022 at 5:53 AM Ahmad Fatoum <[email protected]> wrote:
>
> Hi Tedd, Luiz,
>
> On 21.06.22 20:52, Luiz Augusto von Dentz wrote:
> > Hi Ahmad, Tedd,
> >
> > On Tue, Jun 21, 2022 at 1:32 AM Ahmad Fatoum <[email protected]> wrote:
> >>
> >> Hello Luiz,
> >>
> >> On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
> >>> On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
> >>>> Disconnect of connection #1 being processed after new connection #2
> >>>> concluded sounds wrong. Would I be able to reconnect
> >>>> afterwards or would all connections, but the first, be directly
> >>>> disconnected...?
> >>>
> >>> That depends on the order you have queued the commands, it will be
> >>> processed in the exact order it is received, that why I said it is
> >>> single queue design, and it is done like that to prevent messing up
> >>> with states since we know the exact order the commands will be sent.
> >>>
> >>>>> otherwise we need a
> >>>>> different queue to handle command that abort/cancel other already in
> >>>>> the queue.
> >>>>
> >>>> Is the revert an acceptable interim solution or are there issues
> >>>> I am missing?
> >>>
> >>> Afaik there were problem with concurrent connections request, so what
> >>> would really help us here is to have some tests to emulate this
> >>> scenario with our CI, in the meantime please check if the following
> >>> fixes your problem:
> >>>
> >>> https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae
> >>
> >> Doesn't help unfortunately. First pairing works as before.
> >> Second still fails:
> >>
> >> Bluetooth: hci0: Opcode 0x200d failed: -110
> >> Bluetooth: hci0: request failed to create LE connection: err -110
> >
> > Can we try to add a test in mgmt-tester to reproduce the error above?
>
> I am not familiar with mgmt-tester. What information do you
> need to reproduce? In the meantime, can we revert the commit?
> I understand that this may break other uses, but I believe
> previously working stuff should have precedence..

Have a looks at:

https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/doc/test-runner.txt

And then run with:

sudo tools/test-runner -k <pathyto/bzImage> -- tools/mgmt-tester

Btw, can we have the exact steps to reproduce it using bluetoothctl if possible?

> Cheers,
> Ahmad
>
> >
> >> Cheers,
> >> Ahmad
> >>
> >>>
> >>>> Cheers,
> >>>> Ahmad
> >>>>
> >>>>>
> >>>>>> We've been deploying the revert for a while now and I just posted
> >>>>>> it to the mailing list[1]. There have been other reports
> >>>>>> of this issue with different hardware too and fixing sent_cmd
> >>>>>> would likely be too complicated/time intensive for me.
> >>>>>>
> >>>>>> I am happy to test future patches that fix this properly though.
> >>>>>>
> >>>>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Ahmad
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Ahmad
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Pengutronix e.K. | |
> >>>>>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> >>>>>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> >>>>>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Pengutronix e.K. | |
> >>>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> >>>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> >>>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Pengutronix e.K. | |
> >> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> >> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> >> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >
> >
> >
>
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |



--
Luiz Augusto von Dentz

2022-07-04 12:19:13

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Looks like the discussions to fix this regression got stuck. What can be
done to get thing rolling again? Or has progress been made and I just
missed it? Ciao, Thorsten

On 24.06.22 21:59, Luiz Augusto von Dentz wrote:
> On Fri, Jun 24, 2022 at 5:53 AM Ahmad Fatoum <[email protected]> wrote:
>> On 21.06.22 20:52, Luiz Augusto von Dentz wrote:
>>> On Tue, Jun 21, 2022 at 1:32 AM Ahmad Fatoum <[email protected]> wrote:
>>>> On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
>>>>> On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
>>>>>> Disconnect of connection #1 being processed after new connection #2
>>>>>> concluded sounds wrong. Would I be able to reconnect
>>>>>> afterwards or would all connections, but the first, be directly
>>>>>> disconnected...?
>>>>>
>>>>> That depends on the order you have queued the commands, it will be
>>>>> processed in the exact order it is received, that why I said it is
>>>>> single queue design, and it is done like that to prevent messing up
>>>>> with states since we know the exact order the commands will be sent.
>>>>>
>>>>>>> otherwise we need a
>>>>>>> different queue to handle command that abort/cancel other already in
>>>>>>> the queue.
>>>>>>
>>>>>> Is the revert an acceptable interim solution or are there issues
>>>>>> I am missing?
>>>>>
>>>>> Afaik there were problem with concurrent connections request, so what
>>>>> would really help us here is to have some tests to emulate this
>>>>> scenario with our CI, in the meantime please check if the following
>>>>> fixes your problem:
>>>>>
>>>>> https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae
>>>>
>>>> Doesn't help unfortunately. First pairing works as before.
>>>> Second still fails:
>>>>
>>>> Bluetooth: hci0: Opcode 0x200d failed: -110
>>>> Bluetooth: hci0: request failed to create LE connection: err -110
>>>
>>> Can we try to add a test in mgmt-tester to reproduce the error above?
>>
>> I am not familiar with mgmt-tester. What information do you
>> need to reproduce? In the meantime, can we revert the commit?
>> I understand that this may break other uses, but I believe
>> previously working stuff should have precedence..
>
> Have a looks at:
>
> https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/doc/test-runner.txt
>
> And then run with:
>
> sudo tools/test-runner -k <pathyto/bzImage> -- tools/mgmt-tester
>
> Btw, can we have the exact steps to reproduce it using bluetoothctl if possible?
>
>> Cheers,
>> Ahmad
>>
>>>
>>>> Cheers,
>>>> Ahmad
>>>>
>>>>>
>>>>>> Cheers,
>>>>>> Ahmad
>>>>>>
>>>>>>>
>>>>>>>> We've been deploying the revert for a while now and I just posted
>>>>>>>> it to the mailing list[1]. There have been other reports
>>>>>>>> of this issue with different hardware too and fixing sent_cmd
>>>>>>>> would likely be too complicated/time intensive for me.
>>>>>>>>
>>>>>>>> I am happy to test future patches that fix this properly though.
>>>>>>>>
>>>>>>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t

2022-07-07 05:47:29

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hi,

On 04.07.22 14:11, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> Looks like the discussions to fix this regression got stuck. What can be
> done to get thing rolling again? Or has progress been made and I just
> missed it? Ciao, Thorsten

No progress has been made as far as I am aware. I am reverting the commit
introducing the regression on my systems and haven't yet had the time to
debug this further to help find an alternative solution.

Cheers,
Ahmad

>
> On 24.06.22 21:59, Luiz Augusto von Dentz wrote:
>> On Fri, Jun 24, 2022 at 5:53 AM Ahmad Fatoum <[email protected]> wrote:
>>> On 21.06.22 20:52, Luiz Augusto von Dentz wrote:
>>>> On Tue, Jun 21, 2022 at 1:32 AM Ahmad Fatoum <[email protected]> wrote:
>>>>> On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
>>>>>> On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
>>>>>>> Disconnect of connection #1 being processed after new connection #2
>>>>>>> concluded sounds wrong. Would I be able to reconnect
>>>>>>> afterwards or would all connections, but the first, be directly
>>>>>>> disconnected...?
>>>>>>
>>>>>> That depends on the order you have queued the commands, it will be
>>>>>> processed in the exact order it is received, that why I said it is
>>>>>> single queue design, and it is done like that to prevent messing up
>>>>>> with states since we know the exact order the commands will be sent.
>>>>>>
>>>>>>>> otherwise we need a
>>>>>>>> different queue to handle command that abort/cancel other already in
>>>>>>>> the queue.
>>>>>>>
>>>>>>> Is the revert an acceptable interim solution or are there issues
>>>>>>> I am missing?
>>>>>>
>>>>>> Afaik there were problem with concurrent connections request, so what
>>>>>> would really help us here is to have some tests to emulate this
>>>>>> scenario with our CI, in the meantime please check if the following
>>>>>> fixes your problem:
>>>>>>
>>>>>> https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae
>>>>>
>>>>> Doesn't help unfortunately. First pairing works as before.
>>>>> Second still fails:
>>>>>
>>>>> Bluetooth: hci0: Opcode 0x200d failed: -110
>>>>> Bluetooth: hci0: request failed to create LE connection: err -110
>>>>
>>>> Can we try to add a test in mgmt-tester to reproduce the error above?
>>>
>>> I am not familiar with mgmt-tester. What information do you
>>> need to reproduce? In the meantime, can we revert the commit?
>>> I understand that this may break other uses, but I believe
>>> previously working stuff should have precedence..
>>
>> Have a looks at:
>>
>> https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/doc/test-runner.txt
>>
>> And then run with:
>>
>> sudo tools/test-runner -k <pathyto/bzImage> -- tools/mgmt-tester
>>
>> Btw, can we have the exact steps to reproduce it using bluetoothctl if possible?
>>
>>> Cheers,
>>> Ahmad
>>>
>>>>
>>>>> Cheers,
>>>>> Ahmad
>>>>>
>>>>>>
>>>>>>> Cheers,
>>>>>>> Ahmad
>>>>>>>
>>>>>>>>
>>>>>>>>> We've been deploying the revert for a while now and I just posted
>>>>>>>>> it to the mailing list[1]. There have been other reports
>>>>>>>>> of this issue with different hardware too and fixing sent_cmd
>>>>>>>>> would likely be too complicated/time intensive for me.
>>>>>>>>>
>>>>>>>>> I am happy to test future patches that fix this properly though.
>>>>>>>>>
>>>>>>>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
>


--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2022-08-17 10:39:52

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

On 07.07.22 07:45, Ahmad Fatoum wrote:
> On 04.07.22 14:11, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>> to make this easily accessible to everyone.
>>
>> Looks like the discussions to fix this regression got stuck. What can be
>> done to get thing rolling again? Or has progress been made and I just
>> missed it? Ciao, Thorsten
>
> No progress has been made as far as I am aware. I am reverting the commit
> introducing the regression on my systems and haven't yet had the time to
> debug this further to help find an alternative solution.

Just wondering: still no progress I assume?

Anyway: I'm putting it on the backburner to get it out of the
spot-light, as this doesn't look urgent and things seems to progress
very slowly.

#regzbot backburner: debuging is slow this further

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.


>> On 24.06.22 21:59, Luiz Augusto von Dentz wrote:
>>> On Fri, Jun 24, 2022 at 5:53 AM Ahmad Fatoum <[email protected]> wrote:
>>>> On 21.06.22 20:52, Luiz Augusto von Dentz wrote:
>>>>> On Tue, Jun 21, 2022 at 1:32 AM Ahmad Fatoum <[email protected]> wrote:
>>>>>> On 20.06.22 22:18, Luiz Augusto von Dentz wrote:
>>>>>>> On Mon, Jun 20, 2022 at 3:06 AM Ahmad Fatoum <[email protected]> wrote:
>>>>>>>> Disconnect of connection #1 being processed after new connection #2
>>>>>>>> concluded sounds wrong. Would I be able to reconnect
>>>>>>>> afterwards or would all connections, but the first, be directly
>>>>>>>> disconnected...?
>>>>>>>
>>>>>>> That depends on the order you have queued the commands, it will be
>>>>>>> processed in the exact order it is received, that why I said it is
>>>>>>> single queue design, and it is done like that to prevent messing up
>>>>>>> with states since we know the exact order the commands will be sent.
>>>>>>>
>>>>>>>>> otherwise we need a
>>>>>>>>> different queue to handle command that abort/cancel other already in
>>>>>>>>> the queue.
>>>>>>>>
>>>>>>>> Is the revert an acceptable interim solution or are there issues
>>>>>>>> I am missing?
>>>>>>>
>>>>>>> Afaik there were problem with concurrent connections request, so what
>>>>>>> would really help us here is to have some tests to emulate this
>>>>>>> scenario with our CI, in the meantime please check if the following
>>>>>>> fixes your problem:
>>>>>>>
>>>>>>> https://gist.github.com/Vudentz/b4fff292c7f4ad55ca3299fd5ab797ae
>>>>>>
>>>>>> Doesn't help unfortunately. First pairing works as before.
>>>>>> Second still fails:
>>>>>>
>>>>>> Bluetooth: hci0: Opcode 0x200d failed: -110
>>>>>> Bluetooth: hci0: request failed to create LE connection: err -110
>>>>>
>>>>> Can we try to add a test in mgmt-tester to reproduce the error above?
>>>>
>>>> I am not familiar with mgmt-tester. What information do you
>>>> need to reproduce? In the meantime, can we revert the commit?
>>>> I understand that this may break other uses, but I believe
>>>> previously working stuff should have precedence..
>>>
>>> Have a looks at:
>>>
>>> https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/doc/test-runner.txt
>>>
>>> And then run with:
>>>
>>> sudo tools/test-runner -k <pathyto/bzImage> -- tools/mgmt-tester
>>>
>>> Btw, can we have the exact steps to reproduce it using bluetoothctl if possible?
>>>
>>>> Cheers,
>>>> Ahmad
>>>>
>>>>>
>>>>>> Cheers,
>>>>>> Ahmad
>>>>>>
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Ahmad
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> We've been deploying the revert for a while now and I just posted
>>>>>>>>>> it to the mailing list[1]. There have been other reports
>>>>>>>>>> of this issue with different hardware too and fixing sent_cmd
>>>>>>>>>> would likely be too complicated/time intensive for me.
>>>>>>>>>>
>>>>>>>>>> I am happy to test future patches that fix this properly though.
>>>>>>>>>>
>>>>>>>>>> [1]: https://lore.kernel.org/linux-bluetooth/[email protected]/T/#t
>>
>
>

2023-04-04 12:21:24

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 17.08.22 12:24, Thorsten Leemhuis wrote:
> On 07.07.22 07:45, Ahmad Fatoum wrote:
>> On 04.07.22 14:11, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>> to make this easily accessible to everyone.
>>>
>>> Looks like the discussions to fix this regression got stuck. What can be
>>> done to get thing rolling again? Or has progress been made and I just
>>> missed it? Ciao, Thorsten
>>
>> No progress has been made as far as I am aware. I am reverting the commit
>> introducing the regression on my systems and haven't yet had the time to
>> debug this further to help find an alternative solution.
>
> Just wondering: still no progress I assume?
>
> Anyway: I'm putting it on the backburner to get it out of the
> spot-light, as this doesn't look urgent and things seems to progress
> very slowly.
>
> #regzbot backburner: debuging is slow this further

Great, I see (from interpreting
https://lore.kernel.org/all/[email protected]/
[thx!] and looking at the described errors) that this was finally addressed:

#regzbot fix: 2731e038a76d
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-04-04 12:21:41

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: [BUG] BLE device unpairing triggers kernel panic

Hello Thorsten,

On 04.04.23 14:14, Linux regression tracking #update (Thorsten Leemhuis) wrote:
> [TLDR: This mail in primarily relevant for Linux kernel regression
> tracking. See link in footer if these mails annoy you.]
>
> On 17.08.22 12:24, Thorsten Leemhuis wrote:
>> On 07.07.22 07:45, Ahmad Fatoum wrote:
>>> On 04.07.22 14:11, Thorsten Leemhuis wrote:
>>>> Hi, this is your Linux kernel regression tracker. Top-posting for once,
>>>> to make this easily accessible to everyone.
>>>>
>>>> Looks like the discussions to fix this regression got stuck. What can be
>>>> done to get thing rolling again? Or has progress been made and I just
>>>> missed it? Ciao, Thorsten
>>>
>>> No progress has been made as far as I am aware. I am reverting the commit
>>> introducing the regression on my systems and haven't yet had the time to
>>> debug this further to help find an alternative solution.
>>
>> Just wondering: still no progress I assume?
>>
>> Anyway: I'm putting it on the backburner to get it out of the
>> spot-light, as this doesn't look urgent and things seems to progress
>> very slowly.
>>
>> #regzbot backburner: debuging is slow this further
>
> Great, I see (from interpreting
> https://lore.kernel.org/all/[email protected]/
> [thx!] and looking at the described errors) that this was finally addressed:

No. It's still broken for me, but now the kernel warning that came
along with it is gone. I am discussing this with Luiz still on Github,
but this regression is _not_ resolved.

Thanks,
Ahmad

>
> #regzbot fix: 2731e038a76d
> #regzbot ignore-activity
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>
>

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |