LinuxLists.cc - [PATCH 4.16 REGRESSION fix 0/2] Bluetooth: Fix hci_bcm BT devices getting stuck in runtime-suspended status

2018-03-14 22:06:01

Subject: [PATCH 4.16 REGRESSION fix 0/2] Bluetooth: Fix hci_bcm BT devices getting stuck in runtime-suspended status

Hi All,

See the individual commit messages for the what and why of this series
only patch 1/2 is really necessary to fix the regression.

The regression changes a theoretical race window in the hci_bcm IRQ
handling into a real one and the first patch reverts the offending commit
making the race window a theoretical thing again (we would need a 5 second
stall in the right place to trigger it).

The second patch in the series contains a simple fix closing the theoretical
race window, this was tested with the regression still in place and it does
successfully get things unstuck after hitting the race. I guess this can
wait for 4.17, but I'm sending this out as a series as the patches were
developed in tandem.

This series was tested on an Asus T100CHI with a bcm43241b4 wifi/bt combo
chip.

Regards,

Hans

2018-03-15 18:40:21

by Marcel Holtmann

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters

Hi Hans,

> The IRQ output of the bcm bt-device is really a level IRQ signal, which
> signals a logical high as long as the device's buffer contains data. Since
> the draining in the buffer is done in the tty driver, we cannot (easily)
> wait in a threaded interrupt handler for the draining, after which the
> IRQ should go low again.
>
> So instead we treat the IRQ as an edge interrupt. This opens the window
> for a theoretical race where we wakeup, read some data and then autosuspend
> *before* the IRQ has gone (logical) low, followed by the device just at
> that moment receiving more data, causing the IRQ to stay high and we never
> see an edge.
>
> Since we call pm_runtime_mark_last_busy() on every received byte, there
> should be plenty time for the IRQ to go (logical) low before we ever
> suspend, so this should never happen, but after commit 43fff7683468
> ("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted
> since, this was actually happening causing the device to get stuck in
> runtime suspend.
>
> The bcm bt-device actually has a workaround for this, if we set the
> pulsed_host_wake flag in the sleep parameters, then the device monitors
> if the host is draining the buffer and if not then after a timeout the
> device will pulse the IRQ line, causing us to see an edge, fixing the
> stuck in suspend condition.
>
> This commit sets the pulsed_host_wake flag to fix the (mostly theoretical)
> race caused by us treating the IRQ as an edge IRQ.
>
> Signed-off-by: Hans de Goede <[email protected]>
> ---
> drivers/bluetooth/hci_bcm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

patch has been applied to bluetooth-stable tree.

Regards

Marcel

2018-03-15 18:40:19

by Marcel Holtmann

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

Hi Hans,

> This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
> PM code"). The commit msg for this commit states "No functional change
> intended.", but replacing:
>
> pm_runtime_get();
> pm_runtime_mark_last_busy();
> pm_runtime_put_autosuspend();
>
> with:
>
> pm_request_resume();
>
> Does result in a functional change, pm_request_resume() only calls
> pm_runtime_mark_last_busy() if the device was suspended before the call.
>
> This results in the following happening:
>
> 1) Device is runtime suspended
> 2) Device drives host_wake IRQ logically high as it starts receiving data
> 3) bcm_host_wake() gets called, causes the device to runtime-resume,
> current time gets marked as last_busy time
> 4) After 5 seconds the autosuspend timer expires and the dev autosuspends
> as no one has been calling pm_runtime_mark_last_busy(), the device was
> resumed during those 5 seconds, so all the pm_request_resume() calls
> while receiving data and/or bcm_host_wake() calls were nops
> 5) If 4) happens while the device has (just received) data in its buffer to
> be read by the host the IRQ line is *already* / still logically high
> when we autosuspend and since we use an edge triggered IRQ, the IRQ
> will never trigger, causing the device to get stuck in suspend
>
> Therefor this commit has to be reverted, so that we avoid the device
> getting stuck in suspend.
>
> Cc: Lukas Wunner <[email protected]>
> Signed-off-by: Hans de Goede <[email protected]>
> ---
> drivers/bluetooth/hci_bcm.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)

patch has been applied to bluetooth-stable tree.

Regards

Marcel

2018-03-15 13:49:50

by Hans de Goede

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

Hi,

On 15-03-18 14:15, Marcel Holtmann wrote:
> Hi Lukas,
>
>>>>> We're quite far into the cycle already and this is a serious regression,
>>>>> also nothing of great value is lost by the revert, the original commit
>>>>> was a minor cleanup which turns out to have bad side-effects, a simple
>>>>> revert really is the best solution here, esp. in this point of the cycle.
>>>>
>>>> Just an hour ago he sent me the patch to look over it. And we're at
>>>> least two and a half weeks away from v4.16.
>>>
>>> No we are *only* two and a half weeks away from v4.16 (worst case scenario)
>>> and Linus does not like getting last minute fixes.
>>
>> That doesn't preclude allowing a few hours to discuss things.
>> There is never such a rush. In the present case, a new contributor
>> was willing to debug the issue and submit a patch. Onboarding new
>> contributors is important and IMO it's worth waiting a few days for
>> them to sort things out, even if it means a regression stays present
>> a little longer. I'm sorry that it meant you wasted time debugging
>> it in parallel.
>>
>> That said, when submitting the patch I clearly failed to notice that
>> for devices using autosuspend, pm_request_resume() doesn't update
>> the last usage timestamp. While that could be fixed by calling
>> pm_runtime_mark_last_busy() before pm_request_resume(), it doesn't
>> seem to be customary as a look at all the call sites of
>> pm_request_resume() shows. The original three-line sequence,
>> although quite verbose, appears to be what is commonly used in such
>> a case. For this reason reverting back to the original version
>> seems justified.
>
> there is no reason to rush this through. With a properly worded commit message that explain the reason, I have no problem to do a last minute -rc inclusion.
>
> However what I like to have is a single patch with all Acks and also CC: stable tags if required that we can just send off in the direction towards Linus. I saw there is an alternative patch on the mailing list. So I would like to have a good conclusion on what goes to -rc and that we all agree.

I think we are all in agreement now to just go with the revert patch which I did,
Lucas acked that patch this morning.

Regards,

Hans

2018-03-15 13:15:10

by Marcel Holtmann

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

Hi Lukas,

>>>> We're quite far into the cycle already and this is a serious regression,
>>>> also nothing of great value is lost by the revert, the original commit
>>>> was a minor cleanup which turns out to have bad side-effects, a simple
>>>> revert really is the best solution here, esp. in this point of the cycle.
>>>
>>> Just an hour ago he sent me the patch to look over it. And we're at
>>> least two and a half weeks away from v4.16.
>>
>> No we are *only* two and a half weeks away from v4.16 (worst case scenario)
>> and Linus does not like getting last minute fixes.
>
> That doesn't preclude allowing a few hours to discuss things.
> There is never such a rush. In the present case, a new contributor
> was willing to debug the issue and submit a patch. Onboarding new
> contributors is important and IMO it's worth waiting a few days for
> them to sort things out, even if it means a regression stays present
> a little longer. I'm sorry that it meant you wasted time debugging
> it in parallel.
>
> That said, when submitting the patch I clearly failed to notice that
> for devices using autosuspend, pm_request_resume() doesn't update
> the last usage timestamp. While that could be fixed by calling
> pm_runtime_mark_last_busy() before pm_request_resume(), it doesn't
> seem to be customary as a look at all the call sites of
> pm_request_resume() shows. The original three-line sequence,
> although quite verbose, appears to be what is commonly used in such
> a case. For this reason reverting back to the original version
> seems justified.

there is no reason to rush this through. With a properly worded commit message that explain the reason, I have no problem to do a last minute -rc inclusion.

However what I like to have is a single patch with all Acks and also CC: stable tags if required that we can just send off in the direction towards Linus. I saw there is an alternative patch on the mailing list. So I would like to have a good conclusion on what goes to -rc and that we all agree.

Regards

Marcel

2018-03-15 10:23:55

by Hans de Goede

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

Hi,

On 15-03-18 09:14, Lukas Wunner wrote:
> On Thu, Mar 15, 2018 at 08:49:04AM +0100, Hans de Goede wrote:
>> On 14-03-18 23:38, Lukas Wunner wrote:
>>> On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote:
>>>> We're quite far into the cycle already and this is a serious regression,
>>>> also nothing of great value is lost by the revert, the original commit
>>>> was a minor cleanup which turns out to have bad side-effects, a simple
>>>> revert really is the best solution here, esp. in this point of the cycle.
>>>
>>> Just an hour ago he sent me the patch to look over it. And we're at
>>> least two and a half weeks away from v4.16.
>>
>> No we are *only* two and a half weeks away from v4.16 (worst case scenario)
>> and Linus does not like getting last minute fixes.
>
> That doesn't preclude allowing a few hours to discuss things.
> There is never such a rush. In the present case, a new contributor
> was willing to debug the issue and submit a patch. Onboarding new
> contributors is important and IMO it's worth waiting a few days for
> them to sort things out, even if it means a regression stays present
> a little longer.

I completely agree that onboarding new contributors is important.

Regards,

Hans

2018-03-15 08:32:09

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote:
> This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
> PM code"). The commit msg for this commit states "No functional change
> intended.", but replacing:
>
> pm_runtime_get();
> pm_runtime_mark_last_busy();
> pm_runtime_put_autosuspend();
>
> with:
>
> pm_request_resume();
>
> Does result in a functional change, pm_request_resume() only calls
> pm_runtime_mark_last_busy() if the device was suspended before the call.
>
> This results in the following happening:
>
> 1) Device is runtime suspended
> 2) Device drives host_wake IRQ logically high as it starts receiving data
> 3) bcm_host_wake() gets called, causes the device to runtime-resume,
> current time gets marked as last_busy time
> 4) After 5 seconds the autosuspend timer expires and the dev autosuspends
> as no one has been calling pm_runtime_mark_last_busy(), the device was
> resumed during those 5 seconds, so all the pm_request_resume() calls
> while receiving data and/or bcm_host_wake() calls were nops
> 5) If 4) happens while the device has (just received) data in its buffer to
> be read by the host the IRQ line is *already* / still logically high
> when we autosuspend and since we use an edge triggered IRQ, the IRQ
> will never trigger, causing the device to get stuck in suspend
>
> Therefor this commit has to be reverted, so that we avoid the device
> getting stuck in suspend.
>
> Cc: Lukas Wunner <[email protected]>
> Signed-off-by: Hans de Goede <[email protected]>

Acked-by: Lukas Wunner <[email protected]>

2018-03-15 08:14:08

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

On Thu, Mar 15, 2018 at 08:49:04AM +0100, Hans de Goede wrote:
> On 14-03-18 23:38, Lukas Wunner wrote:
> > On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote:
> > >We're quite far into the cycle already and this is a serious regression,
> > >also nothing of great value is lost by the revert, the original commit
> > >was a minor cleanup which turns out to have bad side-effects, a simple
> > >revert really is the best solution here, esp. in this point of the cycle.
> >
> > Just an hour ago he sent me the patch to look over it. And we're at
> > least two and a half weeks away from v4.16.
>
> No we are *only* two and a half weeks away from v4.16 (worst case scenario)
> and Linus does not like getting last minute fixes.

That doesn't preclude allowing a few hours to discuss things.
There is never such a rush. In the present case, a new contributor
was willing to debug the issue and submit a patch. Onboarding new
contributors is important and IMO it's worth waiting a few days for
them to sort things out, even if it means a regression stays present
a little longer. I'm sorry that it meant you wasted time debugging
it in parallel.

That said, when submitting the patch I clearly failed to notice that
for devices using autosuspend, pm_request_resume() doesn't update
the last usage timestamp. While that could be fixed by calling
pm_runtime_mark_last_busy() before pm_request_resume(), it doesn't
seem to be customary as a look at all the call sites of
pm_request_resume() shows. The original three-line sequence,
although quite verbose, appears to be what is commonly used in such
a case. For this reason reverting back to the original version
seems justified.

Thanks,

Lukas

2018-03-15 07:49:04

by Hans de Goede

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

Hi,

On 14-03-18 23:38, Lukas Wunner wrote:
> On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote:
>> On 14-03-18 23:16, Lukas Wunner wrote:
>>> On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote:
>>>> This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
>>>> PM code"). The commit msg for this commit states "No functional change
>>>> intended.", but replacing:
>>>>
>>>> pm_runtime_get();
>>>> pm_runtime_mark_last_busy();
>>>> pm_runtime_put_autosuspend();
>>>>
>>>> with:
>>>>
>>>> pm_request_resume();
>>>>
>>>> Does result in a functional change, pm_request_resume() only calls
>>>> pm_runtime_mark_last_busy() if the device was suspended before the call.
>>>
>>> Yes, Robert Howell (cc) reported this a few days ago:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=198953
>>>
>>> I've worked with him to develop a fix which is better IMHO than a revert,
>>> namely he's replacing the pm_request_resume() in bcm_recv() with
>>> pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt
>>> handler can stay. He says that fixes the issue for him.
>>
>> It makes the race window a lot smaller, but it still leaves a race:
>>
>> 1) some data comes in, gets full read from the device
>> 2) 4.9999 seconds elapse since last byte has been read
>> 3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend
>> has not yet kicked in
>> 4) runtime suspend kicks in, disabling the uart before the first new byte is received
>> 5) stuck again
>
> Hm okay, but a call to pm_runtime_mark_last_busy() before the
> pm_request_resume() should avoid that. Actually I'm wondering
> why we're not calling pm_runtime_mark_last_busy() in rpm_resume()
> if the device was already resumed as clearly an action is requested
> from it. That needs to be investigated separately.
>
>>> I hope he'll submit the patch shortly.
>>
>> We're quite far into the cycle already and this is a serious regression,
>> also nothing of great value is lost by the revert, the original commit
>> was a minor cleanup which turns out to have bad side-effects, a simple
>> revert really is the best solution here, esp. in this point of the cycle.
>
> Just an hour ago he sent me the patch to look over it. And we're at
> least two and a half weeks away from v4.16.

No we are *only* two and a half weeks away from v4.16 (worst case scenario)
and Linus does not like getting last minute fixes.

I really so no good reason to not fix this with a simple revert, esp.
since as my explanation of the race condition in the fix he send you
shows, getting this right is non trivial. Falling back to the code before
the troublesome commit gives us a known working solution, at 0 cost (as
the reverted commit was just a code cleanup, no functionality is lost).

Anyways this is Marcel's call now.

Regards,

Hans

2018-03-14 22:38:13

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

On Wed, Mar 14, 2018 at 11:23:12PM +0100, Hans de Goede wrote:
> On 14-03-18 23:16, Lukas Wunner wrote:
> >On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote:
> >>This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
> >>PM code"). The commit msg for this commit states "No functional change
> >>intended.", but replacing:
> >>
> >> pm_runtime_get();
> >> pm_runtime_mark_last_busy();
> >> pm_runtime_put_autosuspend();
> >>
> >>with:
> >>
> >> pm_request_resume();
> >>
> >>Does result in a functional change, pm_request_resume() only calls
> >>pm_runtime_mark_last_busy() if the device was suspended before the call.
> >
> >Yes, Robert Howell (cc) reported this a few days ago:
> >https://bugzilla.kernel.org/show_bug.cgi?id=198953
> >
> >I've worked with him to develop a fix which is better IMHO than a revert,
> >namely he's replacing the pm_request_resume() in bcm_recv() with
> >pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt
> >handler can stay. He says that fixes the issue for him.
>
> It makes the race window a lot smaller, but it still leaves a race:
>
> 1) some data comes in, gets full read from the device
> 2) 4.9999 seconds elapse since last byte has been read
> 3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend
> has not yet kicked in
> 4) runtime suspend kicks in, disabling the uart before the first new byte is received
> 5) stuck again

Hm okay, but a call to pm_runtime_mark_last_busy() before the
pm_request_resume() should avoid that. Actually I'm wondering
why we're not calling pm_runtime_mark_last_busy() in rpm_resume()
if the device was already resumed as clearly an action is requested
from it. That needs to be investigated separately.

> >I hope he'll submit the patch shortly.
>
> We're quite far into the cycle already and this is a serious regression,
> also nothing of great value is lost by the revert, the original commit
> was a minor cleanup which turns out to have bad side-effects, a simple
> revert really is the best solution here, esp. in this point of the cycle.

Just an hour ago he sent me the patch to look over it. And we're at
least two and a half weeks away from v4.16.

Thanks,

Lukas

2018-03-14 22:23:12

by Hans de Goede

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

Hi,

On 14-03-18 23:16, Lukas Wunner wrote:
> On Wed, Mar 14, 2018 at 11:06:02PM +0100, Hans de Goede wrote:
>> This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
>> PM code"). The commit msg for this commit states "No functional change
>> intended.", but replacing:
>>
>> pm_runtime_get();
>> pm_runtime_mark_last_busy();
>> pm_runtime_put_autosuspend();
>>
>> with:
>>
>> pm_request_resume();
>>
>> Does result in a functional change, pm_request_resume() only calls
>> pm_runtime_mark_last_busy() if the device was suspended before the call.
>
> Yes, Robert Howell (cc) reported this a few days ago:
> https://bugzilla.kernel.org/show_bug.cgi?id=198953
>
> I've worked with him to develop a fix which is better IMHO than a revert,
> namely he's replacing the pm_request_resume() in bcm_recv() with
> pm_runtime_mark_last_busy(), and the pm_request_resume() in the interrupt
> handler can stay. He says that fixes the issue for him.

It makes the race window a lot smaller, but it still leaves a race:

1) some data comes in, gets full read from the device
2) 4.9999 seconds elapse since last byte has been read
3) new data comes in, triggers IRQ, IRQ does nothing because runtime suspend
has not yet kicked in
4) runtime suspend kicks in, disabling the uart before the first new byte is received
5) stuck again

> I hope he'll submit the patch shortly.

We're quite far into the cycle already and this is a serious regression,
also nothing of great value is lost by the revert, the original commit
was a minor cleanup which turns out to have bad side-effects, a simple
revert really is the best solution here, esp. in this point of the cycle.

Regards,

Hans

2018-03-14 22:22:44

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters

On Wed, Mar 14, 2018 at 11:06:03PM +0100, Hans de Goede wrote:
> The IRQ output of the bcm bt-device is really a level IRQ signal, which
> signals a logical high as long as the device's buffer contains data. Since
> the draining in the buffer is done in the tty driver, we cannot (easily)
> wait in a threaded interrupt handler for the draining, after which the
> IRQ should go low again.
>
> So instead we treat the IRQ as an edge interrupt. This opens the window
> for a theoretical race where we wakeup, read some data and then autosuspend
> *before* the IRQ has gone (logical) low, followed by the device just at
> that moment receiving more data, causing the IRQ to stay high and we never
> see an edge.
>
> Since we call pm_runtime_mark_last_busy() on every received byte, there
> should be plenty time for the IRQ to go (logical) low before we ever
> suspend, so this should never happen, but after commit 43fff7683468
> ("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted
> since, this was actually happening causing the device to get stuck in
> runtime suspend.
>
> The bcm bt-device actually has a workaround for this, if we set the
> pulsed_host_wake flag in the sleep parameters, then the device monitors
> if the host is draining the buffer and if not then after a timeout the
> device will pulse the IRQ line, causing us to see an edge, fixing the
> stuck in suspend condition.
>
> This commit sets the pulsed_host_wake flag to fix the (mostly theoretical)
> race caused by us treating the IRQ as an edge IRQ.
>
> Signed-off-by: Hans de Goede <[email protected]>

Reviewed-by: Lukas Wunner <[email protected]>

Good work, thanks.

> ---
> drivers/bluetooth/hci_bcm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
> index b089012a49f9..04b9a6c75c75 100644
> --- a/drivers/bluetooth/hci_bcm.c
> +++ b/drivers/bluetooth/hci_bcm.c
> @@ -307,7 +307,7 @@ static const struct bcm_set_sleep_mode default_sleep_params = {
> .usb_auto_sleep = 0,
> .usb_resume_timeout = 0,
> .break_to_host = 0,
> - .pulsed_host_wake = 0,
> + .pulsed_host_wake = 1,
> };
>
> static int bcm_setup_sleep(struct hci_uart *hu)
> --
> 2.14.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-03-14 22:16:03

by Lukas Wunner

[permalink] [raw]

Subject: Re: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

2018-03-14 22:06:03

by Hans de Goede

[permalink] [raw]

Subject: [PATCH 4.16 REGRESSION fix 2/2] Bluetooth: hci_bcm: Set pulsed_host_wake flag in sleep parameters

The IRQ output of the bcm bt-device is really a level IRQ signal, which
signals a logical high as long as the device's buffer contains data. Since
the draining in the buffer is done in the tty driver, we cannot (easily)
wait in a threaded interrupt handler for the draining, after which the
IRQ should go low again.

So instead we treat the IRQ as an edge interrupt. This opens the window
for a theoretical race where we wakeup, read some data and then autosuspend
*before* the IRQ has gone (logical) low, followed by the device just at
that moment receiving more data, causing the IRQ to stay high and we never
see an edge.

Since we call pm_runtime_mark_last_busy() on every received byte, there
should be plenty time for the IRQ to go (logical) low before we ever
suspend, so this should never happen, but after commit 43fff7683468
("Bluetooth: hci_bcm: Streamline runtime PM code"), which has been reverted
since, this was actually happening causing the device to get stuck in
runtime suspend.

The bcm bt-device actually has a workaround for this, if we set the
pulsed_host_wake flag in the sleep parameters, then the device monitors
if the host is draining the buffer and if not then after a timeout the
device will pulse the IRQ line, causing us to see an edge, fixing the
stuck in suspend condition.

This commit sets the pulsed_host_wake flag to fix the (mostly theoretical)
race caused by us treating the IRQ as an edge IRQ.

Signed-off-by: Hans de Goede <[email protected]>
---
drivers/bluetooth/hci_bcm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
index b089012a49f9..04b9a6c75c75 100644
--- a/drivers/bluetooth/hci_bcm.c
+++ b/drivers/bluetooth/hci_bcm.c
@@ -307,7 +307,7 @@ static const struct bcm_set_sleep_mode default_sleep_params = {
.usb_auto_sleep = 0,
.usb_resume_timeout = 0,
.break_to_host = 0,
- .pulsed_host_wake = 0,
+ .pulsed_host_wake = 1,
};

static int bcm_setup_sleep(struct hci_uart *hu)
--
2.14.3

2018-03-14 22:06:02

by Hans de Goede

[permalink] [raw]

Subject: [PATCH 4.16 REGRESSION fix 1/2] Revert "Bluetooth: hci_bcm: Streamline runtime PM code"

This reverts commit 43fff7683468 ("Bluetooth: hci_bcm: Streamline runtime
PM code"). The commit msg for this commit states "No functional change
intended.", but replacing:

pm_runtime_get();
pm_runtime_mark_last_busy();
pm_runtime_put_autosuspend();

with:

pm_request_resume();

Does result in a functional change, pm_request_resume() only calls
pm_runtime_mark_last_busy() if the device was suspended before the call.

This results in the following happening:

1) Device is runtime suspended
2) Device drives host_wake IRQ logically high as it starts receiving data
3) bcm_host_wake() gets called, causes the device to runtime-resume,
current time gets marked as last_busy time
4) After 5 seconds the autosuspend timer expires and the dev autosuspends
as no one has been calling pm_runtime_mark_last_busy(), the device was
resumed during those 5 seconds, so all the pm_request_resume() calls
while receiving data and/or bcm_host_wake() calls were nops
5) If 4) happens while the device has (just received) data in its buffer to
be read by the host the IRQ line is *already* / still logically high
when we autosuspend and since we use an edge triggered IRQ, the IRQ
will never trigger, causing the device to get stuck in suspend

Therefor this commit has to be reverted, so that we avoid the device
getting stuck in suspend.

Cc: Lukas Wunner <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
---
drivers/bluetooth/hci_bcm.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c
index 59804a35488c..b089012a49f9 100644
--- a/drivers/bluetooth/hci_bcm.c
+++ b/drivers/bluetooth/hci_bcm.c
@@ -248,7 +248,9 @@ static irqreturn_t bcm_host_wake(int irq, void *data)

bt_dev_dbg(bdev, "Host wake IRQ");

- pm_request_resume(bdev->dev);
+ pm_runtime_get(bdev->dev);
+ pm_runtime_mark_last_busy(bdev->dev);
+ pm_runtime_put_autosuspend(bdev->dev);

return IRQ_HANDLED;
}
@@ -590,8 +592,11 @@ static int bcm_recv(struct hci_uart *hu, const void *data, int count)
} else if (!bcm->rx_skb) {
/* Delay auto-suspend when receiving completed packet */
mutex_lock(&bcm_device_lock);
- if (bcm->dev && bcm_device_exists(bcm->dev))
- pm_request_resume(bcm->dev->dev);
+ if (bcm->dev && bcm_device_exists(bcm->dev)) {
+ pm_runtime_get(bcm->dev->dev);
+ pm_runtime_mark_last_busy(bcm->dev->dev);
+ pm_runtime_put_autosuspend(bcm->dev->dev);
+ }
mutex_unlock(&bcm_device_lock);
}

--
2.14.3