2013-05-22 03:55:41

by Vinicius Costa Gomes

[permalink] [raw]
Subject: Fwd: [BUG] Set Powered=true doesn't power up the adapter

Hi Alex,

[forgot to add linux-bluetooth]

On Mon, May 20, 2013 at 11:07 PM, Alex Deymo <[email protected]> wrote:
> Hello,
>
> I'm seeing a problem while trying to power on the adapter once in a
> while just after reboot. What happens is that sending a dbus call to
> power on the adapter doesn't return. You can read a more detailed
> description of the bug here [1] but the resume is as follows:
>
> Sending org.freedesktop.DBus.Properties.Set for power up the adapter
> doesn't return (dbus timeouts after a while) just after reboot. Is a
> very rare bug, but happens from time to time. Restarting bluetoothd
> with -n -d to get the logs made it start returning with the error
> org.bluez.Error.Failed to the same call. I don't see any HCI command
> with btmon while doing this, so I think it's not a firmware issue at
> that point (could be some other thing before). From this state, I can
> call Set several times to power up the adapter, but doesn't work.
> What is interesting is that if instead of sending MGMT_OP_SET_POWERED
> from bluetoothd, we try to power up the adapter with hciconfig
> (calling ioctl(ctl, HCIDEVUP, hdev) ) the adapter goes up and
> everything works fine from there. We are running kernel 3.8, I don't
> know if this was already reported in a newer version.
>

I can consistently reproduce it:

1. run bluetoothd
2. rfkill block all
3. test-adapter powered on

(current bluetooth-next)

I only got things to work again after a hciconfig up, which makes me
think that the "rfkill block" confuses the kernel (probably) about the
state of the controller.

It is getting late here, I hope this helps, I will try to investigate
it better tomorrow.

> Hope it helps, I'll try to have more debugging information... but it
> would be useful it someone can tell me where to look at, since this is
> not very easy to reproduce.
> Best regards,
> Alex.
>
> [1] https://code.google.com/p/chromium/issues/detail?id=239896#c11
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers,
--
Vinicius


2013-05-30 02:04:02

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Alex,

>>> In case hci_dev_open() fails and rfkill would be one example for that,
>>> we do not return an error back to the mgmt command that originates
>>> this power on.
>>>
>>> Of course there is the initial adapter power on and there is the mgmt
>>> triggered power on. And my guess is that these two are racing against
>>> each other.
>>
>> It seems you're right about the cause. I just sent a patch to fix it.
>
> I saw the patch about return an error back to the userspace, but still
> there is something weird here:
> I think I omit an important detail on my first email (ups =D.. but it
> was on our linked bug). After reboot, a power up initiated using the
> mgmt interface from the bluetoothd will not return, but the previously
> connected devices do work. So the adapter is connected to, for
> example, a mouse, and after reboot the adapter will be shown as
> Powered: False, Set Powered doesn't return, but you can happily move
> the bt mouse and it works.
>
> Do you think that the patch about rfkill ( [PATCH] Bluetooth: Fix mgmt
> handling of power on failures ) should be enough to fix this issue?
> I'm still trying to reproduce the problem with dynamic debug enabled
> for bluetooth and launching bluetoothd with MGMT_DEBUG. The machine
> keeps rebooting until it faces the problem again... but no luck so far
> =(
>

so hci_dev_open() can return more than just ERFKILL. It can also return EALREADY and other error. So this patch can indeed fix your issue as well. It is just rfkilled devices can reproduce this behavior pretty easily.

What I am thinking you are hitting is a race between the automatic power on (which goes through the same function) and the management power on. We might have a larger issue with a race condition hitting here, but the missing handling of the error is clearly one. There is the hci_req lock serializing the auto power-on and mgmt and so this should not happen, but we might have a bug somewhere else as well.

We might need a bit better instrumentation on being able to debug this. I mentioned this in some threads before, that we might want to have actually proper tracing support. Similar to what mac80211 does. Hint hint ;)

Johan, we might need to check if other errors need to be reported differently and EALREADY handled more gracefully during init of an adapter.

Regards

Marcel


2013-05-29 21:51:48

by Alex Deymo

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi,

On Tue, May 28, 2013 at 12:13 AM, Johan Hedberg <[email protected]> wrote:
>
> On Wed, May 22, 2013, Marcel Holtmann wrote:
> > In case hci_dev_open() fails and rfkill would be one example for that,
> > we do not return an error back to the mgmt command that originates
> > this power on.
> >
> > Of course there is the initial adapter power on and there is the mgmt
> > triggered power on. And my guess is that these two are racing against
> > each other.
>
> It seems you're right about the cause. I just sent a patch to fix it.

I saw the patch about return an error back to the userspace, but still
there is something weird here:
I think I omit an important detail on my first email (ups =D.. but it
was on our linked bug). After reboot, a power up initiated using the
mgmt interface from the bluetoothd will not return, but the previously
connected devices do work. So the adapter is connected to, for
example, a mouse, and after reboot the adapter will be shown as
Powered: False, Set Powered doesn't return, but you can happily move
the bt mouse and it works.

Do you think that the patch about rfkill ( [PATCH] Bluetooth: Fix mgmt
handling of power on failures ) should be enough to fix this issue?
I'm still trying to reproduce the problem with dynamic debug enabled
for bluetooth and launching bluetoothd with MGMT_DEBUG. The machine
keeps rebooting until it faces the problem again... but no luck so far
=(

Thanks,
Alex.

2013-05-28 08:09:56

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Johan,

>>>>> If I send any Set Powered command after the controller is unblocked, while
>>>>> there's still one pending command in the queue, this command will fail, and I
>>>>> won't be able to bring the controller up, unless I force a HCI_RESET command.
>>>>>
>>>>> So it seems that we need a way to mark an hdev as rfkill blocked.
>>>>>
>>>>> Does anyone else have any other ideas?
>>>>
>>>> my wild guess right now is that this is the problem:
>>>>
>>>> static void hci_power_on(struct work_struct *work)
>>>> {
>>>> struct hci_dev *hdev = container_of(work, struct hci_dev, power_on);
>>>>
>>>> BT_DBG("%s", hdev->name);
>>>>
>>>> if (hci_dev_open(hdev->id) < 0)
>>>> return;
>>>>
>>>> if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
>>>> queue_delayed_work(hdev->req_workqueue, &hdev->power_off,
>>>> HCI_AUTO_OFF_TIMEOUT);
>>>>
>>>> if (test_and_clear_bit(HCI_SETUP, &hdev->dev_flags))
>>>> mgmt_index_added(hdev);
>>>> }
>>>>
>>>> In case hci_dev_open() fails and rfkill would be one example for that,
>>>> we do not return an error back to the mgmt command that originates
>>>> this power on.
>>>>
>>>> Of course there is the initial adapter power on and there is the mgmt
>>>> triggered power on. And my guess is that these two are racing against
>>>> each other.
>>>
>>> It seems you're right about the cause. I just sent a patch to fix it.
>>> With the patch I get this behavior:
>>>
>>> jh@qemu~/src/bluez{master}$ sudo rfkill list
>>> 0: hci0: Bluetooth
>>> Soft blocked: no
>>> Hard blocked: no
>>> jh@qemu~/src/bluez{master}$ sudo rfkill block 0
>>> jh@qemu~/src/bluez{master}$ sudo tools/btmgmt power on
>>> Set Powered for hci0 failed with status 0x03 (Failed)
>>
>> we could be a bit smarter about the error code here. At least when
>> ERFKILL we should return a more descriptive error via mgmt.
>
> I'm fine with that, and I actually left room for it by passing the exact
> error to the new mgmt_set_powered_failed function. Would you prefer a
> completely new error only used for rfkill or reuse one of our existing
> errors. We've e.g. got MGMT_STATUS_REJECTED which could be used for
> rfkill and MGMT_STATUS_FAILED for anything else.

I think a brand new error might be a good idea. RFKILL has always been special.

Regards

Marcel


2013-05-28 07:49:06

by Johan Hedberg

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Marcel,

On Tue, May 28, 2013, Marcel Holtmann wrote:
> >>> If I send any Set Powered command after the controller is unblocked, while
> >>> there's still one pending command in the queue, this command will fail, and I
> >>> won't be able to bring the controller up, unless I force a HCI_RESET command.
> >>>
> >>> So it seems that we need a way to mark an hdev as rfkill blocked.
> >>>
> >>> Does anyone else have any other ideas?
> >>
> >> my wild guess right now is that this is the problem:
> >>
> >> static void hci_power_on(struct work_struct *work)
> >> {
> >> struct hci_dev *hdev = container_of(work, struct hci_dev, power_on);
> >>
> >> BT_DBG("%s", hdev->name);
> >>
> >> if (hci_dev_open(hdev->id) < 0)
> >> return;
> >>
> >> if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
> >> queue_delayed_work(hdev->req_workqueue, &hdev->power_off,
> >> HCI_AUTO_OFF_TIMEOUT);
> >>
> >> if (test_and_clear_bit(HCI_SETUP, &hdev->dev_flags))
> >> mgmt_index_added(hdev);
> >> }
> >>
> >> In case hci_dev_open() fails and rfkill would be one example for that,
> >> we do not return an error back to the mgmt command that originates
> >> this power on.
> >>
> >> Of course there is the initial adapter power on and there is the mgmt
> >> triggered power on. And my guess is that these two are racing against
> >> each other.
> >
> > It seems you're right about the cause. I just sent a patch to fix it.
> > With the patch I get this behavior:
> >
> > jh@qemu~/src/bluez{master}$ sudo rfkill list
> > 0: hci0: Bluetooth
> > Soft blocked: no
> > Hard blocked: no
> > jh@qemu~/src/bluez{master}$ sudo rfkill block 0
> > jh@qemu~/src/bluez{master}$ sudo tools/btmgmt power on
> > Set Powered for hci0 failed with status 0x03 (Failed)
>
> we could be a bit smarter about the error code here. At least when
> ERFKILL we should return a more descriptive error via mgmt.

I'm fine with that, and I actually left room for it by passing the exact
error to the new mgmt_set_powered_failed function. Would you prefer a
completely new error only used for rfkill or reuse one of our existing
errors. We've e.g. got MGMT_STATUS_REJECTED which could be used for
rfkill and MGMT_STATUS_FAILED for anything else.

Johan

2013-05-28 07:36:51

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Johan,

>>> If I send any Set Powered command after the controller is unblocked, while
>>> there's still one pending command in the queue, this command will fail, and I
>>> won't be able to bring the controller up, unless I force a HCI_RESET command.
>>>
>>> So it seems that we need a way to mark an hdev as rfkill blocked.
>>>
>>> Does anyone else have any other ideas?
>>
>> my wild guess right now is that this is the problem:
>>
>> static void hci_power_on(struct work_struct *work)
>> {
>> struct hci_dev *hdev = container_of(work, struct hci_dev, power_on);
>>
>> BT_DBG("%s", hdev->name);
>>
>> if (hci_dev_open(hdev->id) < 0)
>> return;
>>
>> if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
>> queue_delayed_work(hdev->req_workqueue, &hdev->power_off,
>> HCI_AUTO_OFF_TIMEOUT);
>>
>> if (test_and_clear_bit(HCI_SETUP, &hdev->dev_flags))
>> mgmt_index_added(hdev);
>> }
>>
>> In case hci_dev_open() fails and rfkill would be one example for that,
>> we do not return an error back to the mgmt command that originates
>> this power on.
>>
>> Of course there is the initial adapter power on and there is the mgmt
>> triggered power on. And my guess is that these two are racing against
>> each other.
>
> It seems you're right about the cause. I just sent a patch to fix it.
> With the patch I get this behavior:
>
> jh@qemu~/src/bluez{master}$ sudo rfkill list
> 0: hci0: Bluetooth
> Soft blocked: no
> Hard blocked: no
> jh@qemu~/src/bluez{master}$ sudo rfkill block 0
> jh@qemu~/src/bluez{master}$ sudo tools/btmgmt power on
> Set Powered for hci0 failed with status 0x03 (Failed)

we could be a bit smarter about the error code here. At least when ERFKILL we should return a more descriptive error via mgmt.

Regards

Marcel


2013-05-28 07:13:06

by Johan Hedberg

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Marcel,

On Wed, May 22, 2013, Marcel Holtmann wrote:
> > If I send any Set Powered command after the controller is unblocked, while
> > there's still one pending command in the queue, this command will fail, and I
> > won't be able to bring the controller up, unless I force a HCI_RESET command.
> >
> > So it seems that we need a way to mark an hdev as rfkill blocked.
> >
> > Does anyone else have any other ideas?
>
> my wild guess right now is that this is the problem:
>
> static void hci_power_on(struct work_struct *work)
> {
> struct hci_dev *hdev = container_of(work, struct hci_dev, power_on);
>
> BT_DBG("%s", hdev->name);
>
> if (hci_dev_open(hdev->id) < 0)
> return;
>
> if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
> queue_delayed_work(hdev->req_workqueue, &hdev->power_off,
> HCI_AUTO_OFF_TIMEOUT);
>
> if (test_and_clear_bit(HCI_SETUP, &hdev->dev_flags))
> mgmt_index_added(hdev);
> }
>
> In case hci_dev_open() fails and rfkill would be one example for that,
> we do not return an error back to the mgmt command that originates
> this power on.
>
> Of course there is the initial adapter power on and there is the mgmt
> triggered power on. And my guess is that these two are racing against
> each other.

It seems you're right about the cause. I just sent a patch to fix it.
With the patch I get this behavior:

jh@qemu~/src/bluez{master}$ sudo rfkill list
0: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
jh@qemu~/src/bluez{master}$ sudo rfkill block 0
jh@qemu~/src/bluez{master}$ sudo tools/btmgmt power on
Set Powered for hci0 failed with status 0x03 (Failed)

Johan

2013-05-22 17:49:32

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Vinicius,

>>>> I'm seeing a problem while trying to power on the adapter once in a
>>>> while just after reboot. What happens is that sending a dbus call to
>>>> power on the adapter doesn't return. You can read a more detailed
>>>> description of the bug here [1] but the resume is as follows:
>>>>
>>>> Sending org.freedesktop.DBus.Properties.Set for power up the adapter
>>>> doesn't return (dbus timeouts after a while) just after reboot. Is a
>>>> very rare bug, but happens from time to time. Restarting bluetoothd
>>>> with -n -d to get the logs made it start returning with the error
>>>> org.bluez.Error.Failed to the same call. I don't see any HCI command
>>>> with btmon while doing this, so I think it's not a firmware issue at
>>>> that point (could be some other thing before). From this state, I can
>>>> call Set several times to power up the adapter, but doesn't work.
>>>> What is interesting is that if instead of sending MGMT_OP_SET_POWERED
>>>> from bluetoothd, we try to power up the adapter with hciconfig
>>>> (calling ioctl(ctl, HCIDEVUP, hdev) ) the adapter goes up and
>>>> everything works fine from there. We are running kernel 3.8, I don't
>>>> know if this was already reported in a newer version.
>>>>
>>>
>>> I can consistently reproduce it:
>>>
>>> 1. run bluetoothd
>>> 2. rfkill block all
>>> 3. test-adapter powered on
>>>
>>> (current bluetooth-next)
>>>
>>> I only got things to work again after a hciconfig up, which makes me
>>> think that the "rfkill block" confuses the kernel (probably) about the
>>> state of the controller.
>>
>> what rfkill block does is to bring the controller down and actually refuse to bring it back up until rfkill unblock happens. If we can bring the controller up via mgmt, then we have a bug.
>
> Seems that we have just one bug, when the controller is rfkill blocked, mgmt
> Set Powered (for example) doesn't return (no Command Complete, no Command
> Status).
>
> If I send any Set Powered command after the controller is unblocked, while
> there's still one pending command in the queue, this command will fail, and I
> won't be able to bring the controller up, unless I force a HCI_RESET command.
>
> So it seems that we need a way to mark an hdev as rfkill blocked.
>
> Does anyone else have any other ideas?

my wild guess right now is that this is the problem:

static void hci_power_on(struct work_struct *work)
{
struct hci_dev *hdev = container_of(work, struct hci_dev, power_on);

BT_DBG("%s", hdev->name);

if (hci_dev_open(hdev->id) < 0)
return;

if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags))
queue_delayed_work(hdev->req_workqueue, &hdev->power_off,
HCI_AUTO_OFF_TIMEOUT);

if (test_and_clear_bit(HCI_SETUP, &hdev->dev_flags))
mgmt_index_added(hdev);
}

In case hci_dev_open() fails and rfkill would be one example for that, we do not return an error back to the mgmt command that originates this power on.

Of course there is the initial adapter power on and there is the mgmt triggered power on. And my guess is that these two are racing against each other.

Regards

Marcel


2013-05-22 17:22:13

by Vinicius Costa Gomes

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Marcel,

On 08:47 Wed 22 May, Marcel Holtmann wrote:
> Hi Vinicius,
>
> >> I'm seeing a problem while trying to power on the adapter once in a
> >> while just after reboot. What happens is that sending a dbus call to
> >> power on the adapter doesn't return. You can read a more detailed
> >> description of the bug here [1] but the resume is as follows:
> >>
> >> Sending org.freedesktop.DBus.Properties.Set for power up the adapter
> >> doesn't return (dbus timeouts after a while) just after reboot. Is a
> >> very rare bug, but happens from time to time. Restarting bluetoothd
> >> with -n -d to get the logs made it start returning with the error
> >> org.bluez.Error.Failed to the same call. I don't see any HCI command
> >> with btmon while doing this, so I think it's not a firmware issue at
> >> that point (could be some other thing before). From this state, I can
> >> call Set several times to power up the adapter, but doesn't work.
> >> What is interesting is that if instead of sending MGMT_OP_SET_POWERED
> >> from bluetoothd, we try to power up the adapter with hciconfig
> >> (calling ioctl(ctl, HCIDEVUP, hdev) ) the adapter goes up and
> >> everything works fine from there. We are running kernel 3.8, I don't
> >> know if this was already reported in a newer version.
> >>
> >
> > I can consistently reproduce it:
> >
> > 1. run bluetoothd
> > 2. rfkill block all
> > 3. test-adapter powered on
> >
> > (current bluetooth-next)
> >
> > I only got things to work again after a hciconfig up, which makes me
> > think that the "rfkill block" confuses the kernel (probably) about the
> > state of the controller.
>
> what rfkill block does is to bring the controller down and actually refuse to bring it back up until rfkill unblock happens. If we can bring the controller up via mgmt, then we have a bug.

Seems that we have just one bug, when the controller is rfkill blocked, mgmt
Set Powered (for example) doesn't return (no Command Complete, no Command
Status).

If I send any Set Powered command after the controller is unblocked, while
there's still one pending command in the queue, this command will fail, and I
won't be able to bring the controller up, unless I force a HCI_RESET command.

So it seems that we need a way to mark an hdev as rfkill blocked.

Does anyone else have any other ideas?

>
> Regards
>
> Marcel
>


Cheers,
--
Vinicius

2013-05-22 06:47:31

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [BUG] Set Powered=true doesn't power up the adapter

Hi Vinicius,

>> I'm seeing a problem while trying to power on the adapter once in a
>> while just after reboot. What happens is that sending a dbus call to
>> power on the adapter doesn't return. You can read a more detailed
>> description of the bug here [1] but the resume is as follows:
>>
>> Sending org.freedesktop.DBus.Properties.Set for power up the adapter
>> doesn't return (dbus timeouts after a while) just after reboot. Is a
>> very rare bug, but happens from time to time. Restarting bluetoothd
>> with -n -d to get the logs made it start returning with the error
>> org.bluez.Error.Failed to the same call. I don't see any HCI command
>> with btmon while doing this, so I think it's not a firmware issue at
>> that point (could be some other thing before). From this state, I can
>> call Set several times to power up the adapter, but doesn't work.
>> What is interesting is that if instead of sending MGMT_OP_SET_POWERED
>> from bluetoothd, we try to power up the adapter with hciconfig
>> (calling ioctl(ctl, HCIDEVUP, hdev) ) the adapter goes up and
>> everything works fine from there. We are running kernel 3.8, I don't
>> know if this was already reported in a newer version.
>>
>
> I can consistently reproduce it:
>
> 1. run bluetoothd
> 2. rfkill block all
> 3. test-adapter powered on
>
> (current bluetooth-next)
>
> I only got things to work again after a hciconfig up, which makes me
> think that the "rfkill block" confuses the kernel (probably) about the
> state of the controller.

what rfkill block does is to bring the controller down and actually refuse to bring it back up until rfkill unblock happens. If we can bring the controller up via mgmt, then we have a bug.

Regards

Marcel