2021-12-09 01:59:08

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel

syzbot has bisected this issue to:

commit c97a747efc93f94a4ad6c707972dfbf8d774edf9
Author: Benjamin Berg <[email protected]>
Date: Fri Dec 3 14:59:02 2021 +0000

Bluetooth: btusb: Cancel sync commands for certain URB errors

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10a0fdc5b00000
start commit: 4eee8d0b64ec Add linux-next specific files for 20211208
git tree: linux-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=12a0fdc5b00000
console output: https://syzkaller.appspot.com/x/log.txt?x=14a0fdc5b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=20b74d9da4ce1ef1
dashboard link: https://syzkaller.appspot.com/bug?extid=485cc00ea7cf41dfdbf1
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13e7e955b00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10ed3641b00000

Reported-by: [email protected]
Fixes: c97a747efc93 ("Bluetooth: btusb: Cancel sync commands for certain URB errors")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection


2021-12-09 10:07:01

by Oliver Neukum

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel


On 09.12.21 02:59, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit c97a747efc93f94a4ad6c707972dfbf8d774edf9
> Author: Benjamin Berg <[email protected]>
> Date: Fri Dec 3 14:59:02 2021 +0000
>
> Bluetooth: btusb: Cancel sync commands for certain URB errors

Hi,

looking at the patch, it sleeps in an interrupt handler (or equivalent)
in two places:

@@ -933,6 +933,8 @@ static void btusb_intr_complete(struct urb *urb)
                if (err != -EPERM && err != -ENODEV)
                        bt_dev_err(hdev, "urb %p failed to resubmit (%d)",
                                   urb, -err);
+               if (err != -EPERM)
+                       hci_cmd_sync_cancel(hdev, -err);


@@ -1331,10 +1335,13 @@ static void btusb_tx_complete(struct urb *urb)
        if (!test_bit(HCI_RUNNING, &hdev->flags))
                goto done;
 
-       if (!urb->status)
+       if (!urb->status) {
                hdev->stat.byte_tx += urb->transfer_buffer_length;
-       else
+       } else {
+               if (hci_skb_pkt_type(skb) == HCI_COMMAND_PKT)
+                       hci_cmd_sync_cancel(hdev, -urb->status);

As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
just not
an approach you can take. It looks like asynchronously canceling the
scheduled work
would result in a race, so I would for now just revert.

What issue exactly is this trying to fix or improve?

    Regards
        Oliver


2021-12-09 12:46:56

by Benjamin Berg

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel

Hi,

On Thu, 2021-12-09 at 11:06 +0100, Oliver Neukum wrote:
> As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
> just not
> an approach you can take. It looks like asynchronously canceling the
> scheduled work
> would result in a race, so I would for now just revert.

Right, so this needs to be pushed into a workqueue instead, I suppose.

> What issue exactly is this trying to fix or improve?

The problem is aborting long-running synchronous operations. i.e.
without this patchset, USB enumeration will hang for 10s if a USB
bluetooth device disappears during firmware loading. This is because
even though the USB device is gone and all URB submissions fail, the
operation will only be aborted after the internal timeout happens.

The device in turn disappears because an rfkill switch is blocked and
the platform removes it from the bus. Overall, this can lead to
graphical login to hang as fprintd cannot initialise as it hangs in USB
enumeration.

Benjamin


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part

2021-12-09 13:21:31

by Oliver Neukum

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel


On 09.12.21 13:46, Benjamin Berg wrote:
Hi,
> On Thu, 2021-12-09 at 11:06 +0100, Oliver Neukum wrote:
>> As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
>> just not
>> an approach you can take. It looks like asynchronously canceling the
>> scheduled work
>> would result in a race, so I would for now just revert.
> Right, so this needs to be pushed into a workqueue instead, I suppose.
It looks like overkill, but I have no good alternative to offer either.
>
>> What issue exactly is this trying to fix or improve?
> The problem is aborting long-running synchronous operations. i.e.
> without this patchset, USB enumeration will hang for 10s if a USB
> bluetooth device disappears during firmware loading. This is because
> even though the USB device is gone and all URB submissions fail, the
> operation will only be aborted after the internal timeout happens.
>
I see. Something ought to be done. The issue is in good hands.

    Thanks
        Oliver