On Thu, 7 Apr 2011, Thomas Gleixner wrote:
> On Wed, 6 Apr 2011, Keith Packard wrote:
> > -rc2 was causing resume failures on my X201s -- the video mode would get
> > reset, but the system would then lock up. I bisected the failure down to:
> >
> > commit b77dcf8460ae57d4eb9fd3633eb4f97b8fb20716
> > Author: Thomas Gleixner <[email protected]>
> > Date: Thu Mar 24 20:16:42 2011 +0100
> >
> > Bluetooth: Fix warning with hci_cmd_timer
> >
> > Reverting this on top of -rc2 makes resume reliable again.
>
> Huch. I really have a hard time to connect this. Can you revert the
> two hunks seperately?
Keith confirmed that reverting the second hunk fixes the
problem. Heck, so that timer is rearmed somewhere in the teardown
path?
Can the bluetooth folks please have a look at that ASAP? The obvious
fast fix for Linus tree is to revert the second hunk for now, but this
needs to be fixed proper.
Thanks,
tglx
* Keith Packard <[email protected]> [2011-04-21 17:04:51 -0700]:
>
> On Thu, 21 Apr 2011 19:59:53 -0300, "Gustavo F. Padovan" <[email protected]> wrote:
>
> > Patch is on the way to mainline. It's currently on wireless-2.6. Next steps
> > are net-2.6 -> linux-2.6.
>
> I didn't see it on linux-kernel, so I was a bit worried. I'd love to
> test it in place of the kludge that Vinicius sent me.
I usually don't cc linux-kernel in my pull-requests, maybe I should.
--
Gustavo F. Padovan
http://profusion.mobi
Hi Keith,
* Keith Packard <[email protected]> [2011-04-21 14:50:32 -0700]:
> On Mon, 11 Apr 2011 13:59:23 -0300, Vinicius Gomes <[email protected]> wrote:
>
> > Will send a proper fix soon. Thanks all.
>
> This is still broken in -rc4, I'd love to see a proper patch soon --
> this will affect any machine with a bluetooth device.
Patch is on the way to mainline. It's currently on wireless-2.6. Next steps
are net-2.6 -> linux-2.6.
Regards,
--
Gustavo F. Padovan
http://profusion.mobi
On Tue, 12 Apr 2011, Gustavo F. Padovan wrote:
> * Gustavo F. Padovan <[email protected]> [2011-04-11 19:25:04 -0300]:
> > * Thomas Gleixner <[email protected]> [2011-04-12 00:19:32 +0200]:
> > It is armed at each HCI command we send. In the close path we send out an HCI
> > RESET command that rearms it.
>
> I applied v2 patch from Vin?cius that fix all the symptoms. Now we have more time
> to find the real cause of this bug. However I still have no idea, I'm not able
> to reproduce it.
You might poke Ingo. He had a reproducer.
Thanks,
tglx
* Gustavo F. Padovan <[email protected]> [2011-04-11 19:25:04 -0300]:
> * Thomas Gleixner <[email protected]> [2011-04-12 00:19:32 +0200]:
>=20
> > On Tue, 12 Apr 2011, Thomas Gleixner wrote:
> > > On Mon, 11 Apr 2011, Marcel Holtmann wrote:
> > >=20
> > > > Hi Thomas,
> > > >=20
> > > > > > > > Can the bluetooth folks please have a look at that ASAP? Th=
e obvious
> > > > > > > > fast fix for Linus tree is to revert the second hunk for no=
w, but this
> > > > > > > > needs to be fixed proper.
> > > > > > >=20
> > > > > > > Who will submit this patch? I'd rather have your name on it s=
o that
> > > > > > > people come complain at you...
> > > > > >=20
> > > > > > I took a shot at it and just sent a patch (also attached for co=
nvenience)=20
> > > > > > that should solve the problem.
> > > > >=20
> > > > > Aaarg. No. That patch reverts both hunks.
> > > > >=20
> > > > > --- a/net/bluetooth/hci_core.c
> > > > > +++ b/net/bluetooth/hci_core.c
> > > > > @@ -586,9 +586,6 @@ static int hci_dev_do_close(struct hci_dev *h=
dev)
> > > > > hci_req_cancel(hdev, ENODEV);
> > > > > hci_req_lock(hdev);
> > > > > =20
> > > > > - /* Stop timer, it might be running */
> > > > > - del_timer_sync(&hdev->cmd_timer);
> > > > > -
> > > > > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> > > > > hci_req_unlock(hdev);
> > > > > return 0;
> > > > >=20
> > > > > As I said before you need that first hunk to stay for the case wh=
ere
> > > > > there is no device up and you return via the !HCI_UP check. You j=
ust
> > > > > moved back to the state before as the stupid timer is active for
> > > > > whatever reason even when HCI_UP is not set.
> > > >=20
> > > > if I read this right then we have the case that we arm this timer f=
or no
> > > > real reason. A device in !HCI_UP should have nothing running. Certa=
inly
> > > > not the cmd_timer since it will never process any commands.
> > > >=20
> > > > According to Gustavo, the problem is really in the hci_reset logic =
were
> > > > we arm the timer even when shutting down the device.
> > >=20
> > > The reason why the original patch was sent is, that the timer was
> > > running when the thing went out via the !HCI_UP path, which caused the
> > > whole thing to explode in the first place. I had no time to figure out
> > > why, but moving the del_timer_sync above the
> > > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) solved it.
> >=20
> > Oops. Hit send too fast.
> >=20
> > Then it broke the resume on Keith machine and reverting just the hunk
> > which disarms the timer in the=20
> >=20
> > if (hdev->sent_cmd) {
> >=20
> > path made both scenarios working. So there are two problems:
> >=20
> > 1) Why do we need the del_timer_sync() above the !HCI_UP check
>=20
> That is still a mysterious to me, the real bug the hiding here. I'm tryin=
g to
> track this down but no luck yet.
>=20
> >=20
> > 2) Why gets the timer rearmed after that
>=20
> It is armed at each HCI command we send. In the close path we send out an=
HCI
> RESET command that rearms it.
I applied v2 patch from Vin=EDcius that fix all the symptoms. Now we have m=
ore time
to find the real cause of this bug. However I still have no idea, I'm not a=
ble
to reproduce it.
--=20
Gustavo F. Padovan
http://profusion.mobi
* Thomas Gleixner <[email protected]> [2011-04-12 00:19:32 +0200]:
> On Tue, 12 Apr 2011, Thomas Gleixner wrote:
> > On Mon, 11 Apr 2011, Marcel Holtmann wrote:
> >
> > > Hi Thomas,
> > >
> > > > > > > Can the bluetooth folks please have a look at that ASAP? The obvious
> > > > > > > fast fix for Linus tree is to revert the second hunk for now, but this
> > > > > > > needs to be fixed proper.
> > > > > >
> > > > > > Who will submit this patch? I'd rather have your name on it so that
> > > > > > people come complain at you...
> > > > >
> > > > > I took a shot at it and just sent a patch (also attached for convenience)
> > > > > that should solve the problem.
> > > >
> > > > Aaarg. No. That patch reverts both hunks.
> > > >
> > > > --- a/net/bluetooth/hci_core.c
> > > > +++ b/net/bluetooth/hci_core.c
> > > > @@ -586,9 +586,6 @@ static int hci_dev_do_close(struct hci_dev *hdev)
> > > > hci_req_cancel(hdev, ENODEV);
> > > > hci_req_lock(hdev);
> > > >
> > > > - /* Stop timer, it might be running */
> > > > - del_timer_sync(&hdev->cmd_timer);
> > > > -
> > > > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> > > > hci_req_unlock(hdev);
> > > > return 0;
> > > >
> > > > As I said before you need that first hunk to stay for the case where
> > > > there is no device up and you return via the !HCI_UP check. You just
> > > > moved back to the state before as the stupid timer is active for
> > > > whatever reason even when HCI_UP is not set.
> > >
> > > if I read this right then we have the case that we arm this timer for no
> > > real reason. A device in !HCI_UP should have nothing running. Certainly
> > > not the cmd_timer since it will never process any commands.
> > >
> > > According to Gustavo, the problem is really in the hci_reset logic were
> > > we arm the timer even when shutting down the device.
> >
> > The reason why the original patch was sent is, that the timer was
> > running when the thing went out via the !HCI_UP path, which caused the
> > whole thing to explode in the first place. I had no time to figure out
> > why, but moving the del_timer_sync above the
> > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) solved it.
>
> Oops. Hit send too fast.
>
> Then it broke the resume on Keith machine and reverting just the hunk
> which disarms the timer in the
>
> if (hdev->sent_cmd) {
>
> path made both scenarios working. So there are two problems:
>
> 1) Why do we need the del_timer_sync() above the !HCI_UP check
That is still a mysterious to me, the real bug the hiding here. I'm trying to
track this down but no luck yet.
>
> 2) Why gets the timer rearmed after that
It is armed at each HCI command we send. In the close path we send out an HCI
RESET command that rearms it.
--
Gustavo F. Padovan
http://profusion.mobi
On Tue, 12 Apr 2011, Thomas Gleixner wrote:
> On Mon, 11 Apr 2011, Marcel Holtmann wrote:
>
> > Hi Thomas,
> >
> > > > > > Can the bluetooth folks please have a look at that ASAP? The obvious
> > > > > > fast fix for Linus tree is to revert the second hunk for now, but this
> > > > > > needs to be fixed proper.
> > > > >
> > > > > Who will submit this patch? I'd rather have your name on it so that
> > > > > people come complain at you...
> > > >
> > > > I took a shot at it and just sent a patch (also attached for convenience)
> > > > that should solve the problem.
> > >
> > > Aaarg. No. That patch reverts both hunks.
> > >
> > > --- a/net/bluetooth/hci_core.c
> > > +++ b/net/bluetooth/hci_core.c
> > > @@ -586,9 +586,6 @@ static int hci_dev_do_close(struct hci_dev *hdev)
> > > hci_req_cancel(hdev, ENODEV);
> > > hci_req_lock(hdev);
> > >
> > > - /* Stop timer, it might be running */
> > > - del_timer_sync(&hdev->cmd_timer);
> > > -
> > > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> > > hci_req_unlock(hdev);
> > > return 0;
> > >
> > > As I said before you need that first hunk to stay for the case where
> > > there is no device up and you return via the !HCI_UP check. You just
> > > moved back to the state before as the stupid timer is active for
> > > whatever reason even when HCI_UP is not set.
> >
> > if I read this right then we have the case that we arm this timer for no
> > real reason. A device in !HCI_UP should have nothing running. Certainly
> > not the cmd_timer since it will never process any commands.
> >
> > According to Gustavo, the problem is really in the hci_reset logic were
> > we arm the timer even when shutting down the device.
>
> The reason why the original patch was sent is, that the timer was
> running when the thing went out via the !HCI_UP path, which caused the
> whole thing to explode in the first place. I had no time to figure out
> why, but moving the del_timer_sync above the
> if (!test_and_clear_bit(HCI_UP, &hdev->flags)) solved it.
Oops. Hit send too fast.
Then it broke the resume on Keith machine and reverting just the hunk
which disarms the timer in the
if (hdev->sent_cmd) {
path made both scenarios working. So there are two problems:
1) Why do we need the del_timer_sync() above the !HCI_UP check
2) Why gets the timer rearmed after that
Thanks,
tglx
On Mon, 11 Apr 2011, Marcel Holtmann wrote:
> Hi Thomas,
>
> > > > > Can the bluetooth folks please have a look at that ASAP? The obvious
> > > > > fast fix for Linus tree is to revert the second hunk for now, but this
> > > > > needs to be fixed proper.
> > > >
> > > > Who will submit this patch? I'd rather have your name on it so that
> > > > people come complain at you...
> > >
> > > I took a shot at it and just sent a patch (also attached for convenience)
> > > that should solve the problem.
> >
> > Aaarg. No. That patch reverts both hunks.
> >
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -586,9 +586,6 @@ static int hci_dev_do_close(struct hci_dev *hdev)
> > hci_req_cancel(hdev, ENODEV);
> > hci_req_lock(hdev);
> >
> > - /* Stop timer, it might be running */
> > - del_timer_sync(&hdev->cmd_timer);
> > -
> > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> > hci_req_unlock(hdev);
> > return 0;
> >
> > As I said before you need that first hunk to stay for the case where
> > there is no device up and you return via the !HCI_UP check. You just
> > moved back to the state before as the stupid timer is active for
> > whatever reason even when HCI_UP is not set.
>
> if I read this right then we have the case that we arm this timer for no
> real reason. A device in !HCI_UP should have nothing running. Certainly
> not the cmd_timer since it will never process any commands.
>
> According to Gustavo, the problem is really in the hci_reset logic were
> we arm the timer even when shutting down the device.
The reason why the original patch was sent is, that the timer was
running when the thing went out via the !HCI_UP path, which caused the
whole thing to explode in the first place. I had no time to figure out
why, but moving the del_timer_sync above the
if (!test_and_clear_bit(HCI_UP, &hdev->flags)) solved it.
Thanks,
tglx
Hi Thomas,
> > > > Can the bluetooth folks please have a look at that ASAP? The obvious
> > > > fast fix for Linus tree is to revert the second hunk for now, but this
> > > > needs to be fixed proper.
> > >
> > > Who will submit this patch? I'd rather have your name on it so that
> > > people come complain at you...
> >
> > I took a shot at it and just sent a patch (also attached for convenience)
> > that should solve the problem.
>
> Aaarg. No. That patch reverts both hunks.
>
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -586,9 +586,6 @@ static int hci_dev_do_close(struct hci_dev *hdev)
> hci_req_cancel(hdev, ENODEV);
> hci_req_lock(hdev);
>
> - /* Stop timer, it might be running */
> - del_timer_sync(&hdev->cmd_timer);
> -
> if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> hci_req_unlock(hdev);
> return 0;
>
> As I said before you need that first hunk to stay for the case where
> there is no device up and you return via the !HCI_UP check. You just
> moved back to the state before as the stupid timer is active for
> whatever reason even when HCI_UP is not set.
if I read this right then we have the case that we arm this timer for no
real reason. A device in !HCI_UP should have nothing running. Certainly
not the cmd_timer since it will never process any commands.
According to Gustavo, the problem is really in the hci_reset logic were
we arm the timer even when shutting down the device.
Regards
Marcel
Hi Gustavo,
On Mon, Apr 11, 2011 at 1:34 PM, Alan Cox <[email protected]> wrote:
>> I took a shot at it and just sent a patch (also attached for convenience)
>> that should solve the problem.
>
> Well its reverting too much but its also looking pretty bogus to me
>
> - /* Stop timer, it might be running */
> - del_timer_sync(&hdev->cmd_timer);
> -
> if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
> hci_req_unlock(hdev);
> return 0;
>
> So you've now got a path where you leave the timer running and didn't
> before - not it appears one that is a good idea.
>
> Certainly not the kind of change that should be considered for a
> regression fix for an rc kernel. It's far too big a behavioural change to
> be safe.
Considering what was said and that this patch didn't hit your public tree yet,
I guess that this patch should be ignored.
Will send a proper fix soon. Thanks all.
>
> Alan
>
Cheers,
--
Vinicius
> I took a shot at it and just sent a patch (also attached for convenience)
> that should solve the problem.
Well its reverting too much but its also looking pretty bogus to me
- /* Stop timer, it might be running */
- del_timer_sync(&hdev->cmd_timer);
-
if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
hci_req_unlock(hdev);
return 0;
So you've now got a path where you leave the timer running and didn't
before - not it appears one that is a good idea.
Certainly not the kind of change that should be considered for a
regression fix for an rc kernel. It's far too big a behavioural change to
be safe.
Alan
On Mon, 11 Apr 2011, Gustavo F. Padovan wrote:
> * Keith Packard <[email protected]> [2011-04-08 19:03:25 -0700]:
>
> > On Fri, 8 Apr 2011 21:08:57 -0300, Vinicius Costa Gomes <[email protected]> wrote:
> >
> > > I took a shot at it and just sent a patch (also attached for convenience)
> > > that should solve the problem.
> >
> > Thanks much! I applied this to -rc2 and it appears to work fine in some
> > brief testing (a dozen suspend/resume cycles)
> >
> > > Reported-by: Keith Packard <[email protected]>
> > > Signed-off-by: Vinicius Costa Gomes <[email protected]>
> >
> > Tested-by: Keith Packard <[email protected]>
>
> I applied the patch to bluetooth-2.6. Thank you all.
Sigh. Is actually anyone trying to read what I wrote ?
On Fri, 8 Apr 2011, Vinicius Costa Gomes wrote:
> Hi Keith,
>
> On 16:13 Fri 08 Apr, Keith Packard wrote:
> > On Fri, 8 Apr 2011 23:44:51 +0200 (CEST), Thomas Gleixner <[email protected]> wrote:
> >
> > > Can the bluetooth folks please have a look at that ASAP? The obvious
> > > fast fix for Linus tree is to revert the second hunk for now, but this
> > > needs to be fixed proper.
> >
> > Who will submit this patch? I'd rather have your name on it so that
> > people come complain at you...
>
> I took a shot at it and just sent a patch (also attached for convenience)
> that should solve the problem.
Aaarg. No. That patch reverts both hunks.
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -586,9 +586,6 @@ static int hci_dev_do_close(struct hci_dev *hdev)
hci_req_cancel(hdev, ENODEV);
hci_req_lock(hdev);
- /* Stop timer, it might be running */
- del_timer_sync(&hdev->cmd_timer);
-
if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
hci_req_unlock(hdev);
return 0;
As I said before you need that first hunk to stay for the case where
there is no device up and you return via the !HCI_UP check. You just
moved back to the state before as the stupid timer is active for
whatever reason even when HCI_UP is not set.
@@ -618,6 +615,9 @@ static int hci_dev_do_close(struct hci_dev *hdev)
clear_bit(HCI_INIT, &hdev->flags);
}
+ /* Stop timer, it might be running */
+ del_timer_sync(&hdev->cmd_timer);
+
/* Kill cmd task */
tasklet_kill(&hdev->cmd_task);
* Keith Packard <[email protected]> [2011-04-08 19:03:25 -0700]:
> On Fri, 8 Apr 2011 21:08:57 -0300, Vinicius Costa Gomes <[email protected]> wrote:
>
> > I took a shot at it and just sent a patch (also attached for convenience)
> > that should solve the problem.
>
> Thanks much! I applied this to -rc2 and it appears to work fine in some
> brief testing (a dozen suspend/resume cycles)
>
> > Reported-by: Keith Packard <[email protected]>
> > Signed-off-by: Vinicius Costa Gomes <[email protected]>
>
> Tested-by: Keith Packard <[email protected]>
I applied the patch to bluetooth-2.6. Thank you all.
--
Gustavo F. Padovan
http://profusion.mobi
Hi Keith,
On 16:13 Fri 08 Apr, Keith Packard wrote:
> On Fri, 8 Apr 2011 23:44:51 +0200 (CEST), Thomas Gleixner <[email protected]> wrote:
>
> > Can the bluetooth folks please have a look at that ASAP? The obvious
> > fast fix for Linus tree is to revert the second hunk for now, but this
> > needs to be fixed proper.
>
> Who will submit this patch? I'd rather have your name on it so that
> people come complain at you...
I took a shot at it and just sent a patch (also attached for convenience)
that should solve the problem.
>
> --
> [email protected]
Cheers,
--
Vinicius
On Sat, 23 Apr 2011 13:37:59 -0300
"Gustavo F. Padovan" <[email protected]> wrote:
> * Keith Packard <[email protected]> [2011-04-21 17:04:51 -0700]:
>
> >
> > On Thu, 21 Apr 2011 19:59:53 -0300, "Gustavo F. Padovan" <[email protected]> wrote:
> >
> > > Patch is on the way to mainline. It's currently on wireless-2.6. Next steps
> > > are net-2.6 -> linux-2.6.
> >
> > I didn't see it on linux-kernel, so I was a bit worried. I'd love to
> > test it in place of the kludge that Vinicius sent me.
>
> I usually don't cc linux-kernel in my pull-requests, maybe I should.
Did this hit mainline already? I seem to suffer from this still with -rc5
--
Stefan Seyfried
"Dispatch war rocket Ajax to bring back his body!"