2017-11-16 21:45:04

by Lukas Wunner

[permalink] [raw]
Subject: Re: Boot crash @hci_uart_tx_wakeup+0x38/0x148 w/ Linus' HEAD?

On Thu, Nov 16, 2017 at 10:30:45PM +0100, Lukas Wunner wrote:
> On Thu, Nov 16, 2017 at 12:58:27PM -0800, John Stultz wrote:
> > On Wed, Nov 15, 2017 at 6:00 PM, John Stultz <[email protected]> wrote:
> > > After updating to Linus' HEAD today, I'm seeing the following odd
> > > boot time crash with the HiKey board (which uses the serdev driver).
> > >
> > > [ 1.963009] Unable to handle kernel read from unreadable memory at
> > > virtual address 406f127000
> > > [ 1.963012] Mem abort info:
> > > [ 1.963015] ESR = 0x96000005
> > > [ 1.963018] Exception class = DABT (current EL), IL = 32 bits
> > > [ 1.963021] SET = 0, FnV = 0
> > > [ 1.963023] EA = 0, S1PTW = 0
> > > [ 1.963025] Data abort info:
> > > [ 1.963027] ISV = 0, ISS = 0x00000005
> > > [ 1.963030] CM = 0, WnR = 0
> > > [ 1.963032] [000000406f127000] user address but active_mm is swapper
> > > [ 1.963038] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> > > [ 1.963046] CPU: 1 PID: 1282 Comm: kworker/u17:1 Not tainted
> > > 4.14.0-07281-g1b386f4 #666
> > > [ 1.963050] Hardware name: HiKey Development Board (DT)
> > > [ 1.963068] Workqueue: hci0 hci_cmd_work
> > > [ 1.963074] task: ffffffc0753c8000 task.stack: ffffff800b6c0000
> > > [ 1.963079] pstate: 80400005 (Nzcv daif +PAN -UAO)
> > > [ 1.963090] pc : hci_uart_tx_wakeup+0x38/0x148
> > > [ 1.963095] lr : hci_uart_tx_wakeup+0x30/0x148
> > > [ 1.963098] sp : ffffff800b6c3d10
> > > [ 1.963101] x29: ffffff800b6c3d10 x28: 0000000000000000
> > > [ 1.963107] x27: ffffffc074a79c78 x26: ffffffc07504d830
> > > [ 1.963112] x25: ffffff8008f76a30 x24: ffffffc07510a840
> > > [ 1.963117] x23: ffffffc07510aa10 x22: 0000000000000001
> > > [ 1.963122] x21: ffffff8008cd2000 x20: ffffffc0351cf288
> > > [ 1.963128] x19: ffffffc0351cf218 x18: 0000000000000000
> > > [ 1.963132] x17: 0000000000000000 x16: 0000000000000000
> > > [ 1.963137] x15: 0000000000000000 x14: ffffffc005fa1c00
> > > [ 1.963142] x13: 000000406f127000 x12: 0000000034d5d91d
> > > [ 1.963147] x11: 0000000000000400 x10: ffffffc077f8c480
> > > [ 1.963152] x9 : 0000000000000000 x8 : 0000000000000005
> > > [ 1.963157] x7 : 0000000000000000 x6 : ffffff8008f59e88
> > > [ 1.963162] x5 : 0000000000000003 x4 : 000000406f127000
> > > [ 1.963167] x3 : 0000000000000001 x2 : 000000406f127000
> > > [ 1.963172] x1 : ffffff8008cd2d28 x0 : 0000000000000000
> > > [ 1.963179] Process kworker/u17:1 (pid: 1282, stack limit =
> > > 0xffffff800b6c0000)
> > > [ 1.963182] Call trace:
> > > [ 1.963188] hci_uart_tx_wakeup+0x38/0x148
> > > [ 1.963193] hci_uart_send_frame+0x28/0x38
> > > [ 1.963197] hci_send_frame+0x64/0xc0
> > > [ 1.963201] hci_cmd_work+0x98/0x110
> > > [ 1.963209] process_one_work+0x134/0x330
> > > [ 1.963214] worker_thread+0x130/0x468
> > > [ 1.963220] kthread+0xf8/0x128
> > > [ 1.963227] ret_from_fork+0x10/0x18
> > > [ 1.963234] Code: 9134a2a0 97f3be03 f9402280 d538d082 (b8626801)
> > > [ 1.963239] ---[ end trace 457a26b9096bec64 ]---
> > >
> >
> > So I bisected this down and the issue looks like the boot regression is from:
> > 67d2f8781b9f ("Bluetooth: hci_ldisc: Allow sleeping while proto
> > locks are held")
> >
> > Reverting that change makes things work again.
>
> Hm, I notice percpu_init_rwsem() is only ever called in the hci_ldisc
> case (from hci_uart_tty_open()) but apparently never in the hci_serdev
> case. Could that be the culprit? Wondering why it's working in the
> rwlock case then, perhaps by luck?

As a shot in the dark, does the below patch help?

-- >8 --
diff --git a/drivers/bluetooth/hci_serdev.c b/drivers/bluetooth/hci_serdev.c
index 71664b2..2582bef 100644
--- a/drivers/bluetooth/hci_serdev.c
+++ b/drivers/bluetooth/hci_serdev.c
@@ -304,6 +304,8 @@ int hci_uart_register_device(struct hci_uart *hu,

INIT_WORK(&hu->write_work, hci_uart_write_work);

+ percpu_init_rwsem(&hu->proto_lock);
+
/* Only when vendor specific setup callback is provided, consider
* the manufacturer information valid. This avoids filling in the
* value for Ericsson when nothing is specified.


2017-11-17 06:07:29

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

Hi Lukas,

> John Stultz reports a boot time crash with the HiKey board (which uses
> hci_serdev) occurring in hci_uart_tx_wakeup(). That function is
> contained in hci_ldisc.c, but also called from the newer hci_serdev.c.
> It acquires the proto_lock in struct hci_uart and it turns out that we
> forgot to init the lock in the serdev code path, thus causing the crash.
>
> John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc:
> Allow sleeping while proto locks are held"), but the issue was present
> before and the commit merely exposed it. (Perhaps by luck, the crash
> did not occur with rwlocks.)
>
> Init the proto_lock in the serdev code path to avoid the oops.
>
> Stack trace for posterity:
>
> Unable to handle kernel read from unreadable memory at 406f127000
> [000000406f127000] user address but active_mm is swapper
> Internal error: Oops: 96000005 [#1] PREEMPT SMP
> Hardware name: HiKey Development Board (DT)
> Call trace:
> hci_uart_tx_wakeup+0x38/0x148
> hci_uart_send_frame+0x28/0x38
> hci_send_frame+0x64/0xc0
> hci_cmd_work+0x98/0x110
> process_one_work+0x134/0x330
> worker_thread+0x130/0x468
> kthread+0xf8/0x128
> ret_from_fork+0x10/0x18
>
> Link: https://lkml.org/lkml/2017/11/15/908
> Reported-and-tested-by: John Stultz <[email protected]>
> Cc: Ronald Tschalär <[email protected]>
> Cc: Rob Herring <[email protected]>
> Cc: Sumit Semwal <[email protected]>
> Signed-off-by: Lukas Wunner <[email protected]>
> ---
> @Rob (and everyone else): I'm not sure if this is in fact the correct
> approach, or if we should instead duplicate hci_uart_tx_wakeup() in
> hci_serdev.c (sans locking?), much as we've duplicated a lot of other
> functions there. Let me know what your preference is. Thanks!
>
> drivers/bluetooth/hci_serdev.c | 1 +
> 1 file changed, 1 insertion(+)

patch has been applied to bluetooth-next tree.

Regards

Marcel


2017-11-16 23:59:55

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

On Fri, Nov 17, 2017 at 12:54:53AM +0100, Lukas Wunner wrote:
> John Stultz reports a boot time crash with the HiKey board (which uses
> hci_serdev) occurring in hci_uart_tx_wakeup(). That function is
> contained in hci_ldisc.c, but also called from the newer hci_serdev.c.
> It acquires the proto_lock in struct hci_uart and it turns out that we
> forgot to init the lock in the serdev code path, thus causing the crash.
>
> John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc:
> Allow sleeping while proto locks are held"), but the issue was present
> before and the commit merely exposed it. (Perhaps by luck, the crash
> did not occur with rwlocks.)
>
> Init the proto_lock in the serdev code path to avoid the oops.

Apologies, I botched Gustavo Padovan's and Johan Hedberg's e-mail
address in the To: header, please fix up when replying.

Lukas

2017-11-16 23:54:53

by Lukas Wunner

[permalink] [raw]
Subject: [PATCH] Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

John Stultz reports a boot time crash with the HiKey board (which uses
hci_serdev) occurring in hci_uart_tx_wakeup(). That function is
contained in hci_ldisc.c, but also called from the newer hci_serdev.c.
It acquires the proto_lock in struct hci_uart and it turns out that we
forgot to init the lock in the serdev code path, thus causing the crash.

John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc:
Allow sleeping while proto locks are held"), but the issue was present
before and the commit merely exposed it. (Perhaps by luck, the crash
did not occur with rwlocks.)

Init the proto_lock in the serdev code path to avoid the oops.

Stack trace for posterity:

Unable to handle kernel read from unreadable memory at 406f127000
[000000406f127000] user address but active_mm is swapper
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Hardware name: HiKey Development Board (DT)
Call trace:
hci_uart_tx_wakeup+0x38/0x148
hci_uart_send_frame+0x28/0x38
hci_send_frame+0x64/0xc0
hci_cmd_work+0x98/0x110
process_one_work+0x134/0x330
worker_thread+0x130/0x468
kthread+0xf8/0x128
ret_from_fork+0x10/0x18

Link: https://lkml.org/lkml/2017/11/15/908
Reported-and-tested-by: John Stultz <[email protected]>
Cc: Ronald Tschalär <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Sumit Semwal <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
@Rob (and everyone else): I'm not sure if this is in fact the correct
approach, or if we should instead duplicate hci_uart_tx_wakeup() in
hci_serdev.c (sans locking?), much as we've duplicated a lot of other
functions there. Let me know what your preference is. Thanks!

drivers/bluetooth/hci_serdev.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_serdev.c b/drivers/bluetooth/hci_serdev.c
index 71664b2..e0e6461 100644
--- a/drivers/bluetooth/hci_serdev.c
+++ b/drivers/bluetooth/hci_serdev.c
@@ -303,6 +303,7 @@ int hci_uart_register_device(struct hci_uart *hu,
hci_set_drvdata(hdev, hu);

INIT_WORK(&hu->write_work, hci_uart_write_work);
+ percpu_init_rwsem(&hu->proto_lock);

/* Only when vendor specific setup callback is provided, consider
* the manufacturer information valid. This avoids filling in the
--
2.11.0

2017-11-16 22:29:51

by John Stultz

[permalink] [raw]
Subject: Re: Boot crash @hci_uart_tx_wakeup+0x38/0x148 w/ Linus' HEAD?

On Thu, Nov 16, 2017 at 1:45 PM, Lukas Wunner <[email protected]> wrote:
> On Thu, Nov 16, 2017 at 10:30:45PM +0100, Lukas Wunner wrote:
>> On Thu, Nov 16, 2017 at 12:58:27PM -0800, John Stultz wrote:
>> > On Wed, Nov 15, 2017 at 6:00 PM, John Stultz <[email protected]> wrote:
>> > > After updating to Linus' HEAD today, I'm seeing the following odd
>> > > boot time crash with the HiKey board (which uses the serdev driver).
>> > >
>> > > [ 1.963009] Unable to handle kernel read from unreadable memory at
>> > > virtual address 406f127000
>> > > [ 1.963012] Mem abort info:
>> > > [ 1.963015] ESR = 0x96000005
>> > > [ 1.963018] Exception class = DABT (current EL), IL = 32 bits
>> > > [ 1.963021] SET = 0, FnV = 0
>> > > [ 1.963023] EA = 0, S1PTW = 0
>> > > [ 1.963025] Data abort info:
>> > > [ 1.963027] ISV = 0, ISS = 0x00000005
>> > > [ 1.963030] CM = 0, WnR = 0
>> > > [ 1.963032] [000000406f127000] user address but active_mm is swapper
>> > > [ 1.963038] Internal error: Oops: 96000005 [#1] PREEMPT SMP
>> > > [ 1.963046] CPU: 1 PID: 1282 Comm: kworker/u17:1 Not tainted
>> > > 4.14.0-07281-g1b386f4 #666
>> > > [ 1.963050] Hardware name: HiKey Development Board (DT)
>> > > [ 1.963068] Workqueue: hci0 hci_cmd_work
>> > > [ 1.963074] task: ffffffc0753c8000 task.stack: ffffff800b6c0000
>> > > [ 1.963079] pstate: 80400005 (Nzcv daif +PAN -UAO)
>> > > [ 1.963090] pc : hci_uart_tx_wakeup+0x38/0x148
>> > > [ 1.963095] lr : hci_uart_tx_wakeup+0x30/0x148
>> > > [ 1.963098] sp : ffffff800b6c3d10
>> > > [ 1.963101] x29: ffffff800b6c3d10 x28: 0000000000000000
>> > > [ 1.963107] x27: ffffffc074a79c78 x26: ffffffc07504d830
>> > > [ 1.963112] x25: ffffff8008f76a30 x24: ffffffc07510a840
>> > > [ 1.963117] x23: ffffffc07510aa10 x22: 0000000000000001
>> > > [ 1.963122] x21: ffffff8008cd2000 x20: ffffffc0351cf288
>> > > [ 1.963128] x19: ffffffc0351cf218 x18: 0000000000000000
>> > > [ 1.963132] x17: 0000000000000000 x16: 0000000000000000
>> > > [ 1.963137] x15: 0000000000000000 x14: ffffffc005fa1c00
>> > > [ 1.963142] x13: 000000406f127000 x12: 0000000034d5d91d
>> > > [ 1.963147] x11: 0000000000000400 x10: ffffffc077f8c480
>> > > [ 1.963152] x9 : 0000000000000000 x8 : 0000000000000005
>> > > [ 1.963157] x7 : 0000000000000000 x6 : ffffff8008f59e88
>> > > [ 1.963162] x5 : 0000000000000003 x4 : 000000406f127000
>> > > [ 1.963167] x3 : 0000000000000001 x2 : 000000406f127000
>> > > [ 1.963172] x1 : ffffff8008cd2d28 x0 : 0000000000000000
>> > > [ 1.963179] Process kworker/u17:1 (pid: 1282, stack limit =
>> > > 0xffffff800b6c0000)
>> > > [ 1.963182] Call trace:
>> > > [ 1.963188] hci_uart_tx_wakeup+0x38/0x148
>> > > [ 1.963193] hci_uart_send_frame+0x28/0x38
>> > > [ 1.963197] hci_send_frame+0x64/0xc0
>> > > [ 1.963201] hci_cmd_work+0x98/0x110
>> > > [ 1.963209] process_one_work+0x134/0x330
>> > > [ 1.963214] worker_thread+0x130/0x468
>> > > [ 1.963220] kthread+0xf8/0x128
>> > > [ 1.963227] ret_from_fork+0x10/0x18
>> > > [ 1.963234] Code: 9134a2a0 97f3be03 f9402280 d538d082 (b8626801)
>> > > [ 1.963239] ---[ end trace 457a26b9096bec64 ]---
>> > >
>> >
>> > So I bisected this down and the issue looks like the boot regression is from:
>> > 67d2f8781b9f ("Bluetooth: hci_ldisc: Allow sleeping while proto
>> > locks are held")
>> >
>> > Reverting that change makes things work again.
>>
>> Hm, I notice percpu_init_rwsem() is only ever called in the hci_ldisc
>> case (from hci_uart_tty_open()) but apparently never in the hci_serdev
>> case. Could that be the culprit? Wondering why it's working in the
>> rwlock case then, perhaps by luck?
>
> As a shot in the dark, does the below patch help?
>
> -- >8 --
> diff --git a/drivers/bluetooth/hci_serdev.c b/drivers/bluetooth/hci_serdev.c
> index 71664b2..2582bef 100644
> --- a/drivers/bluetooth/hci_serdev.c
> +++ b/drivers/bluetooth/hci_serdev.c
> @@ -304,6 +304,8 @@ int hci_uart_register_device(struct hci_uart *hu,
>
> INIT_WORK(&hu->write_work, hci_uart_write_work);
>
> + percpu_init_rwsem(&hu->proto_lock);
> +

Yes. This seems to avoid the issue!

thanks
-john

2017-12-11 21:49:11

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

On Thu, Nov 16, 2017 at 10:07 PM, Marcel Holtmann <[email protected]> wrote:
> Hi Lukas,
>
>> John Stultz reports a boot time crash with the HiKey board (which uses
>> hci_serdev) occurring in hci_uart_tx_wakeup(). That function is
>> contained in hci_ldisc.c, but also called from the newer hci_serdev.c.
>> It acquires the proto_lock in struct hci_uart and it turns out that we
>> forgot to init the lock in the serdev code path, thus causing the crash.
>>
>> John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc:
>> Allow sleeping while proto locks are held"), but the issue was present
>> before and the commit merely exposed it. (Perhaps by luck, the crash
>> did not occur with rwlocks.)
>>
>> Init the proto_lock in the serdev code path to avoid the oops.
>>
[snip]
> patch has been applied to bluetooth-next tree.

Sorry to be a nuisance if its just a timing thing, but I wanted to
follow up on this just to make sure it didn't fall through the cracks,
as I noticed w/ -rc3 it hasn't landed yet.

thanks
-john

2018-01-02 20:45:44

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

Hi John,

>>>> John Stultz reports a boot time crash with the HiKey board (which uses
>>>> hci_serdev) occurring in hci_uart_tx_wakeup(). That function is
>>>> contained in hci_ldisc.c, but also called from the newer hci_serdev.c.
>>>> It acquires the proto_lock in struct hci_uart and it turns out that we
>>>> forgot to init the lock in the serdev code path, thus causing the crash.
>>>>
>>>> John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc:
>>>> Allow sleeping while proto locks are held"), but the issue was present
>>>> before and the commit merely exposed it. (Perhaps by luck, the crash
>>>> did not occur with rwlocks.)
>>>>
>>>> Init the proto_lock in the serdev code path to avoid the oops.
>>>>
>> [snip]
>>> patch has been applied to bluetooth-next tree.
>>
>> Sorry to be a nuisance if its just a timing thing, but I wanted to
>> follow up on this just to make sure it didn't fall through the cracks,
>> as I noticed w/ -rc3 it hasn't landed yet.
>
> Happy new year all,
> Just wanted to send another ping on this as it seems it hasn't made
> it into -rc6. Did this get missed as being tagged as a (at least
> functional) regression fix?

since it wasn’t marked as urgent fix, it never went on the path to -rc. It is on the path for the next release. And then can be included in -stable if needed.

Regards

Marcel

2018-01-02 20:39:59

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH] Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

On Mon, Dec 11, 2017 at 1:49 PM, John Stultz <[email protected]> wrote:
> On Thu, Nov 16, 2017 at 10:07 PM, Marcel Holtmann <[email protected]> wrote:
>> Hi Lukas,
>>
>>> John Stultz reports a boot time crash with the HiKey board (which uses
>>> hci_serdev) occurring in hci_uart_tx_wakeup(). That function is
>>> contained in hci_ldisc.c, but also called from the newer hci_serdev.c.
>>> It acquires the proto_lock in struct hci_uart and it turns out that we
>>> forgot to init the lock in the serdev code path, thus causing the crash.
>>>
>>> John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc:
>>> Allow sleeping while proto locks are held"), but the issue was present
>>> before and the commit merely exposed it. (Perhaps by luck, the crash
>>> did not occur with rwlocks.)
>>>
>>> Init the proto_lock in the serdev code path to avoid the oops.
>>>
> [snip]
>> patch has been applied to bluetooth-next tree.
>
> Sorry to be a nuisance if its just a timing thing, but I wanted to
> follow up on this just to make sure it didn't fall through the cracks,
> as I noticed w/ -rc3 it hasn't landed yet.

Happy new year all,
Just wanted to send another ping on this as it seems it hasn't made
it into -rc6. Did this get missed as being tagged as a (at least
functional) regression fix?

thanks
-john

2018-02-01 06:42:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Backport d73e17281665 to v4.15 stable

On Thu, Feb 01, 2018 at 06:50:52AM +0100, Lukas Wunner wrote:
> Dear Greg,
>
> when you refill the patch queue for v4.15 stable one of these days,
> please consider adding
>
> commit d73e172816652772114827abaa2dbc053eecbbd7
> Author: Lukas Wunner <[email protected]>
> Date: Fri Nov 17 00:54:53 2017 +0100
>
> Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops
>
> to fix a boot crash on certain machines with UART-attached Bluetooth
> which was kindly reported by John Stultz. The patch unfortunately
> lacked a Fixes: or Cc: stable tag. It landed in Linus' tree 7 hours ago.

Now queued up, thanks.

greg k-h

2018-02-01 05:50:52

by Lukas Wunner

[permalink] [raw]
Subject: Backport d73e17281665 to v4.15 stable

Dear Greg,

when you refill the patch queue for v4.15 stable one of these days,
please consider adding

commit d73e172816652772114827abaa2dbc053eecbbd7
Author: Lukas Wunner <[email protected]>
Date: Fri Nov 17 00:54:53 2017 +0100

Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops

to fix a boot crash on certain machines with UART-attached Bluetooth
which was kindly reported by John Stultz. The patch unfortunately
lacked a Fixes: or Cc: stable tag. It landed in Linus' tree 7 hours ago.

Thanks!

Lukas