2016-10-03 22:22:37

by Jérôme de Bretagne

[permalink] [raw]
Subject: Bluetooth not working anymore due to too agressive PM in hci_bcm ? (was: [PATCH v5 5/5] Bluetooth: hci_bcm: Add suspend/resume runtime PM functions)

Hi Frederic, Hi Marcel,

I've been trying to figure this out for quite some time: since the Bluetooth
BCM chipset of my ThinkPad 8 tablet was added into Linux kernel 4.6,
Bluetooth has never been working fully as expected.

To describe the issue: Bluetooth devices (keyboard, mouse) would work and
keep connected *only* if a device scan was maintained running in the
background. In my setup, the trick was achieved by having the Gnome
Bluetooth setting page opened. Without this trick, devices would simply stop
responding after a short amount of time; my BT keyboard would also seem
disconnected during the session login (despite btattach being properly
launched in the background) and would resume right away with a device scan.

I don't know if this issue is widespread but I've found references to at
least 2 similar cases at [1] & [2], both for products using hci_bcm. In the
bug descriptions, it was stated that kernel 4.3 was working fine while 4.4
wasn't anymore. And indeed, I was able to reproduce this on my side.

By investigating the various changes introduced in 4.4, I've been able to
pinpoint the commit [3] that introduces the regression on my tablet:
   commit: e88ab30d3669f08e94e66e7f926713be93af97fc
   Bluetooth: hci_bcm: Add suspend/resume runtime PM functions

since building a kernel with its parent commit [4] works perfectly fine: 
   commit: 7649faff1cfe4f76dabf78cd53d659d39f65b3c1
   Bluetooth: Remove useless rx_lock spinlock

The code change adding the runtime PM functions introduces an autosleep
delay but it's unclear from the commit description how the BT chipset is
supposed to get out of sleep afterwards.

For 3 different devices at least (Asus T100 Chi, Asus T100TA and Lenovo
ThinkPad Tablet 8), it seems to lead in practice to a regression since the
BT chipset doesn't wake up anymore on normal operations (moving the mouse,
typing a key on the keyboard, etc) breaking Bluetooth from a user PoV. And
maybe a few more products using hci_bcm are impacted too.   

If you have a patch suggestion for a potential fix, I would be glad to give
it a try in the coming days and confirm if it solves the issue or not. 

If a proper fix is going to need quite more time otherwise, could you
consider reverting this specific commit in the meantime? Thanks.

Regards,

Jérôme


[1] https://bugzilla.kernel.org/show_bug.cgi?id=109321#c3 (Comments 3 & 4)
for Asus T100TA
[2] https://bugzilla.kernel.org/show_bug.cgi?id=114561#c1 for Asus T100 Chi
[3] https://git.kernel.org/cgit/linux/kernel/git/bluetooth/bluetooth-next.gi
t/commit/?id=e88ab30d3669f08e94e66e7f926713be93af97fc
[4] https://git.kernel.org/cgit/linux/kernel/git/bluetooth/bluetooth-next.gi
t/commit/?id=7649faff1cfe4f76dabf78cd53d659d39f65b3c1


2016-10-06 07:03:08

by Jérôme de Bretagne

[permalink] [raw]
Subject: Re: Bluetooth not working anymore due to too agressive PM in hci_bcm ? (was: [PATCH v5 5/5] Bluetooth: hci_bcm: Add suspend/resume runtime PM functions)

Hi Loic,

> > I remember Frederic had some gpio/irq polarity issues:>
> > 5cebdfea32b89911d4540440c1c2854a1a3d591e
> > Bluetooth: hci_bcm: Fix IRQ polarity for T100
> > Maybe the ACPI table is not correct on some devices.
>
>
> I had seen that patch indeed. I'm attaching the ACPI table I had shared on
> the list a long time ago for the ThinkPad Tablet 8. Is there anything that
> looks suspicious in there and that you could link to a similar gpio/irq
> polarity issue maybe?
>
> If you think there is a good chance that this could be the issue as well,
> I
> will try to build a similar patch for this ThinkPad product and share my
> results.

Just to let you know, after finding the right DMI values to simply adapt the
above patch for my ThinkPad tablet, I can confirm that a quick local kernel
modification indeed fixes the issue.

I will need a bit more time to propose a proper patch, adding this other
impacted tablet instead of replacing the above Asus one, and supporting
potentially even more products facing the same polarity issue.

Is hard-coding such exceptions / polarity issues manually the only way to go
or could there be a way to detect this programmatically instead?

Regards,
Jérôme

2016-10-05 21:18:19

by Jérôme de Bretagne

[permalink] [raw]
Subject: Re: Bluetooth not working anymore due to too agressive PM in hci_bcm ? (was: [PATCH v5 5/5] Bluetooth: hci_bcm: Add suspend/resume runtime PM functions)

Hi Marcel,

> > Using out-of-band gpio/irq which are retrieved from the ACPI device.
> > wakeup can be initiated by host or controller depending which one wants
> > to transfer hci data.
>
> I am actually curious if HCI remote wakeup might not work. So the HID
> mouse (I assume Classic here) will initiate a HCI Create Connection which
> will result in a HCI Connection Request. So that wakeup event needs to be
> allowed. I am curious if we might not set that correctly.

That's correct, my HID mouse is using standard Bluetooth BR/EDR, not LE. 


> I would need to see the btmon trace from the controller initialization
> (start btmon before calling btattach). I am no longer sure that this has
> been tested with UART based controllers. On USB the remote wakeup works.

Here you go, I'm attaching traces taken with btmon:

1/ btmon.thinkpad8-init: the one you've asked, showing the controller init


and 2 other ones, just in case:

2/ btmon.thinkpad8-connecting: showing the mouse connecting right after the init 1/ so still properly detected at that time and doing some cursor moves

3/ btmon.thinkpad8-not-connecting: traces taken if the mouse is not switched on during the controller initialization, but only a bit later (roughly 30s or 1min). Switching on the mouse then shows simply no trace at all.


> If the events for connection creation and remote wakeup set correctly,
> then it is some GPIO or Broadcom vendor specific thing.

>From the attached traces 1/, do the events seem to be set up correctly
according to you?

Regards,
Jérôme


Attachments:
btmon.thinkpad8-init.btsnoop (52.66 kB)
btmon.thinkpad8-connecting.btsnoop (4.74 kB)
btmon.thinkpad8-not-connecting.btsnoop (342.00 B)
Download all attachments

2016-10-05 21:16:05

by Jérôme de Bretagne

[permalink] [raw]
Subject: Re: Bluetooth not working anymore due to too agressive PM in hci_bcm ? (was: [PATCH v5 5/5] Bluetooth: hci_bcm: Add suspend/resume runtime PM functions)

(trying a different email for Frederic as I got a delivery failure)


Hi Loic,

> Using out-of-band gpio/irq which are retrieved from the ACPI device.
> wakeup can be initiated by host or controller depending which one wants
> to transfer hci data.

Thanks for this explanation.

>
> I'm going to retry on my T100 and let you know, but used to work.

Did you have chance to retry on a T100. Is it a T100TA model by the way, or
a T100 Chi (to make the distinction between the 2 bugzilla kernel entries)?

>
> - What is the state of your interrupt counter ("host_wake") in 
> /proc/interrupts when you try to generate host wake-up events (using 
> mouse...).

Moving the mouse or clicking on it doesn't change the host_wake interrupt
counter in /proc/interrupts after the autosleep period. If I manually
trigger a new scan, then the counter starts increasing, but it freezes again
after the next autosleep.

>
> - Could you confirm you don't meet the issue if you disable runtime-pm 
> via sysfs. You should find the device (BCM43*) as a child node of your 
> tty in /sys/class/tty*. (echo on > power/control).

I can confirm that with the following command, the mouse keeps working and
doesn't suddenly stop after a sleep period.

root@thinkpad8:/sys/class/tty/ttyS1/device/BCM2E55:00# echo on >
power/control


> I remember Frederic had some gpio/irq polarity issues:> 5cebdfea32b89911d4540440c1c2854a1a3d591e
> Bluetooth: hci_bcm: Fix IRQ polarity for T100
> Maybe the ACPI table is not correct on some devices.

I had seen that patch indeed. I'm attaching the ACPI table I had shared on
the list a long time ago for the ThinkPad Tablet 8. Is there anything that
looks suspicious in there and that you could link to a similar gpio/irq
polarity issue maybe?

If you think there is a good chance that this could be the issue as well, I
will try to build a similar patch for this ThinkPad product and share my
results.

Regards,
Jérôme


Attachments:
dsdt.dsl (539.31 kB)

2016-10-04 08:38:46

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Bluetooth not working anymore due to too agressive PM in hci_bcm ? (was: [PATCH v5 5/5] Bluetooth: hci_bcm: Add suspend/resume runtime PM functions)

Hi Loic,

>> since building a kernel with its parent commit [4] works perfectly fine:
>> commit: 7649faff1cfe4f76dabf78cd53d659d39f65b3c1
>> Bluetooth: Remove useless rx_lock spinlock
>>
>> The code change adding the runtime PM functions introduces an autosleep
>> delay but it's unclear from the commit description how the BT chipset is
>> supposed to get out of sleep afterwards.
>
> Using out-of-band gpio/irq which are retrieved from the ACPI device.
> wakeup can be initiated by host or controller depending which one wants
> to transfer hci data.

I am actually curious if HCI remote wakeup might not work. So the HID mouse (I assume Classic here) will initiate a HCI Create Connection which will result in a HCI Connection Request. So that wakeup event needs to be allowed. I am curious if we might not set that correctly.

I would need to see the btmon trace from the controller initialization (start btmon before calling btattach). I am no longer sure that this has been tested with UART based controllers. On USB the remote wakeup works.

If the events for connection creation and remote wakeup set correctly, then it is some GPIO or Broadcom vendor specific thing.

Regards

Marcel


2016-10-04 07:46:31

by Loic Poulain

[permalink] [raw]
Subject: Re: Bluetooth not working anymore due to too agressive PM in hci_bcm ? (was: [PATCH v5 5/5] Bluetooth: hci_bcm: Add suspend/resume runtime PM functions)

Hi Jerome,

>
> since building a kernel with its parent commit [4] works perfectly fine:
> commit: 7649faff1cfe4f76dabf78cd53d659d39f65b3c1
> Bluetooth: Remove useless rx_lock spinlock
>
> The code change adding the runtime PM functions introduces an autosleep
> delay but it's unclear from the commit description how the BT chipset is
> supposed to get out of sleep afterwards.

Using out-of-band gpio/irq which are retrieved from the ACPI device.
wakeup can be initiated by host or controller depending which one wants
to transfer hci data.

>
> For 3 different devices at least (Asus T100 Chi, Asus T100TA and Lenovo
> ThinkPad Tablet 8), it seems to lead in practice to a regression since the
> BT chipset doesn't wake up anymore on normal operations (moving the mouse,
> typing a key on the keyboard, etc) breaking Bluetooth from a user PoV. And
> maybe a few more products using hci_bcm are impacted too.

I'm going to retry on my T100 and let you know, but used to work.

>
> If you have a patch suggestion for a potential fix, I would be glad to give
> it a try in the coming days and confirm if it solves the issue or not.

- What is the state of your interrupt counter ("host_wake") in
/proc/interrupts when you try to generate host wake-up events (using
mouse...).

- Could you confirm you don't meet the issue if you disable runtime-pm
via sysfs. You should find the device (BCM43*) as a child node of your
tty in /sys/class/tty*. (echo on > power/control).

>
> If a proper fix is going to need quite more time otherwise, could you
> consider reverting this specific commit in the meantime? Thanks.

I remember Frederic had some gpio/irq polarity issues:
5cebdfea32b89911d4540440c1c2854a1a3d591e
Bluetooth: hci_bcm: Fix IRQ polarity for T100
Maybe the ACPI table is not correct on some devices.

Regards,
Loic