2010-07-19 22:19:35

by Marcel Holtmann

[permalink] [raw]
Subject: iwlagn and many firmware restarts with Fedora kernel

Hi,

so during the last few weeks, I have seen a huge amount of firmware
restarts with my Intel 5350 card and Fedora 13 kernel (2.6.33.6-147).

iwlagn 0000:03:00.0: low ack count detected, restart firmware
iwlagn 0000:03:00.0: On demand firmware reload
iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting
iwlagn 0000:03:00.0: queue number out of range: 0, must be 10 to 19

If this happens then I don't see it once, I normally see this 10-20
times and the connectivity is stalled until the cards comes back to
life. I have seen patches that should have fixed this symptom, but they
might be also send to -stable since this is a major hassle.

The time without connectivity is something around 4-5 minutes or longer
when this happens. Not really funny.

Regards

Marcel




2010-07-21 19:59:05

by drago01

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

On Tue, Jul 20, 2010 at 1:56 AM, Guy, Wey-Yi <[email protected]> wrote:
> Hi drago,
>
>
> Are you using 5350? here I attach a "RFC patch", could you give a try to
> see if it help?

Not quite I am on 5300; your patch seem to only touch the 5350 code
... should I try the same change for 5300?

2010-07-19 23:29:38

by Marcel Holtmann

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

Hi,

> > so during the last few weeks, I have seen a huge amount of firmware
> > restarts with my Intel 5350 card and Fedora 13 kernel (2.6.33.6-147).
> >
> > iwlagn 0000:03:00.0: low ack count detected, restart firmware
> > iwlagn 0000:03:00.0: On demand firmware reload
> > iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting
> > iwlagn 0000:03:00.0: queue number out of range: 0, must be 10 to 19
> >
> > If this happens then I don't see it once, I normally see this 10-20
> > times and the connectivity is stalled until the cards comes back to
> > life. I have seen patches that should have fixed this symptom, but they
> > might be also send to -stable since this is a major hassle.
> >
> > The time without connectivity is something around 4-5 minutes or longer
> > when this happens. Not really funny.
>
> This happens here too even with the 2.6.34.1-9.fc13.x86_64 kernel;
> when this happens reloading the iwlagn module seems to be the only
> (quick) way to get it back to life.

the quick way is re-loading the module, that is true. Otherwise you have
to sit it out. It always comes back nicely and starts working again.

Regards

Marcel



2010-07-23 22:32:16

by drago01

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

On Thu, Jul 22, 2010 at 8:37 PM, Guy, Wey-Yi <[email protected]> wrote:
> Hi drago,
>
>
> On Wed, 2010-07-21 at 14:00 -0700, drago01 wrote:
>> On Wed, Jul 21, 2010 at 10:37 PM, Guy, Wey-Yi <[email protected]> wrote:
>> > Hi drago,
>> > On Wed, 2010-07-21 at 12:59 -0700, drago01 wrote:
>> >> On Tue, Jul 20, 2010 at 1:56 AM, Guy, Wey-Yi <[email protected]> wrote:
>> >> > Hi drago,
>> >> >
>> >> >
>> >> > Are you using 5350? here I attach a "RFC patch", could you give a try to
>> >> > see if it help?
>> >>
>> >> Not quite I am on 5300; your patch seem to only touch the 5350 code
>> >> ... should I try the same change for 5300?
>> >
>> > Yes, please
>>
>> Hi,
>>
>> As there is no such field in .34 I patched the .35 driver which seems
>> to be fine with the change ... I couldn't trigger it using the close
>> lid (no suspend) and wait a bit trick ... but I have not used it for
>> long enough to say for certain that its gone.
>>
>> But unfortunately the driver has a different issue it spams my log with tons of:
>>
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>>
> Yes, I aware of this, this is introduced by a recent patch
> commit d9763a384216336e180399b69461eae37f6c4f54
>
> I will work on a new patch to help address this.

OK, as for the other patch after a day in use I could not trigger the
problem so it does indeed fix it.

Thanks

2010-07-22 18:38:08

by Wey-Yi Guy

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

Hi drago,


On Wed, 2010-07-21 at 14:00 -0700, drago01 wrote:
> On Wed, Jul 21, 2010 at 10:37 PM, Guy, Wey-Yi <[email protected]> wrote:
> > Hi drago,
> > On Wed, 2010-07-21 at 12:59 -0700, drago01 wrote:
> >> On Tue, Jul 20, 2010 at 1:56 AM, Guy, Wey-Yi <[email protected]> wrote:
> >> > Hi drago,
> >> >
> >> >
> >> > Are you using 5350? here I attach a "RFC patch", could you give a try to
> >> > see if it help?
> >>
> >> Not quite I am on 5300; your patch seem to only touch the 5350 code
> >> ... should I try the same change for 5300?
> >
> > Yes, please
>
> Hi,
>
> As there is no such field in .34 I patched the .35 driver which seems
> to be fine with the change ... I couldn't trigger it using the close
> lid (no suspend) and wait a bit trick ... but I have not used it for
> long enough to say for certain that its gone.
>
> But unfortunately the driver has a different issue it spams my log with tons of:
>
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>
Yes, I aware of this, this is introduced by a recent patch
commit d9763a384216336e180399b69461eae37f6c4f54

I will work on a new patch to help address this.

Thanks
Wey


2010-07-21 20:37:45

by Wey-Yi Guy

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

Hi drago,
On Wed, 2010-07-21 at 12:59 -0700, drago01 wrote:
> On Tue, Jul 20, 2010 at 1:56 AM, Guy, Wey-Yi <[email protected]> wrote:
> > Hi drago,
> >
> >
> > Are you using 5350? here I attach a "RFC patch", could you give a try to
> > see if it help?
>
> Not quite I am on 5300; your patch seem to only touch the 5350 code
> ... should I try the same change for 5300?

Yes, please

Thanks
Wey


2010-07-25 21:25:21

by drago01

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

On Sat, Jul 24, 2010 at 6:34 AM, Guy, Wey-Yi W <[email protected]> wrote:
>
> Great, thank you very much for testing it. I will prepare the patch for upstream.

OK, thanks please don't forgot about stable as it seems that both .33
and .34 are affected too.

> Btw, I also have patch to help the other problem you encounter.

OK, thanks I'll test it once it shows up.

2010-07-19 23:28:42

by Marcel Holtmann

[permalink] [raw]
Subject: RE: iwlagn and many firmware restarts with Fedora kernel

Hi Wey,

> so during the last few weeks, I have seen a huge amount of firmware
> restarts with my Intel 5350 card and Fedora 13 kernel (2.6.33.6-147).
>
> iwlagn 0000:03:00.0: low ack count detected, restart firmware
> iwlagn 0000:03:00.0: On demand firmware reload
> iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting
> iwlagn 0000:03:00.0: queue number out of range: 0, must be 10 to 19
>
> If this happens then I don't see it once, I normally see this 10-20
> times and the connectivity is stalled until the cards comes back to
> life. I have seen patches that should have fixed this symptom, but they
> might be also send to -stable since this is a major hassle.
>
> The time without connectivity is something around 4-5 minutes or longer
> when this happens. Not really funny.
>
> Which patch you referring to? I will try to work on submit it to stable.

no patch in particular, I just have seen some commit messages that would
be indicating this got fixed.

Regards

Marcel



2010-07-19 22:33:05

by drago01

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

On Mon, Jul 19, 2010 at 8:43 PM, Marcel Holtmann <[email protected]> wrote:
> Hi,
>
> so during the last few weeks, I have seen a huge amount of firmware
> restarts with my Intel 5350 card and Fedora 13 kernel (2.6.33.6-147).
>
> iwlagn 0000:03:00.0: low ack count detected, restart firmware
> iwlagn 0000:03:00.0: On demand firmware reload
> iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting
> iwlagn 0000:03:00.0: queue number out of range: 0, must be 10 to 19
>
> If this happens then I don't see it once, I normally see this 10-20
> times and the connectivity is stalled until the cards comes back to
> life. I have seen patches that should have fixed this symptom, but they
> might be also send to -stable since this is a major hassle.
>
> The time without connectivity is something around 4-5 minutes or longer
> when this happens. Not really funny.

This happens here too even with the 2.6.34.1-9.fc13.x86_64 kernel;
when this happens reloading the iwlagn module seems to be the only
(quick) way to get it back to life.

2010-07-19 22:23:56

by Wey-Yi Guy

[permalink] [raw]
Subject: RE: iwlagn and many firmware restarts with Fedora kernel

SGkgTWFyY2VsLA0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogbGludXgtd2ly
ZWxlc3Mtb3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtd2lyZWxlc3Mtb3duZXJA
dmdlci5rZXJuZWwub3JnXSBPbiBCZWhhbGYgT2YgTWFyY2VsIEhvbHRtYW5uDQpTZW50OiBNb25k
YXksIEp1bHkgMTksIDIwMTAgMTE6NDQgQU0NClRvOiBsaW51eC13aXJlbGVzc0B2Z2VyLmtlcm5l
bC5vcmcNClN1YmplY3Q6IGl3bGFnbiBhbmQgbWFueSBmaXJtd2FyZSByZXN0YXJ0cyB3aXRoIEZl
ZG9yYSBrZXJuZWwNCg0KSGksDQoNCnNvIGR1cmluZyB0aGUgbGFzdCBmZXcgd2Vla3MsIEkgaGF2
ZSBzZWVuIGEgaHVnZSBhbW91bnQgb2YgZmlybXdhcmUNCnJlc3RhcnRzIHdpdGggbXkgSW50ZWwg
NTM1MCBjYXJkIGFuZCBGZWRvcmEgMTMga2VybmVsICgyLjYuMzMuNi0xNDcpLg0KDQppd2xhZ24g
MDAwMDowMzowMC4wOiBsb3cgYWNrIGNvdW50IGRldGVjdGVkLCByZXN0YXJ0IGZpcm13YXJlDQpp
d2xhZ24gMDAwMDowMzowMC4wOiBPbiBkZW1hbmQgZmlybXdhcmUgcmVsb2FkDQppd2xhZ24gMDAw
MDowMzowMC4wOiBTdG9wcGluZyBBR0cgd2hpbGUgc3RhdGUgbm90IE9OIG9yIHN0YXJ0aW5nDQpp
d2xhZ24gMDAwMDowMzowMC4wOiBxdWV1ZSBudW1iZXIgb3V0IG9mIHJhbmdlOiAwLCBtdXN0IGJl
IDEwIHRvIDE5DQoNCklmIHRoaXMgaGFwcGVucyB0aGVuIEkgZG9uJ3Qgc2VlIGl0IG9uY2UsIEkg
bm9ybWFsbHkgc2VlIHRoaXMgMTAtMjANCnRpbWVzIGFuZCB0aGUgY29ubmVjdGl2aXR5IGlzIHN0
YWxsZWQgdW50aWwgdGhlIGNhcmRzIGNvbWVzIGJhY2sgdG8NCmxpZmUuIEkgaGF2ZSBzZWVuIHBh
dGNoZXMgdGhhdCBzaG91bGQgaGF2ZSBmaXhlZCB0aGlzIHN5bXB0b20sIGJ1dCB0aGV5DQptaWdo
dCBiZSBhbHNvIHNlbmQgdG8gLXN0YWJsZSBzaW5jZSB0aGlzIGlzIGEgbWFqb3IgaGFzc2xlLg0K
DQpUaGUgdGltZSB3aXRob3V0IGNvbm5lY3Rpdml0eSBpcyBzb21ldGhpbmcgYXJvdW5kIDQtNSBt
aW51dGVzIG9yIGxvbmdlcg0Kd2hlbiB0aGlzIGhhcHBlbnMuIE5vdCByZWFsbHkgZnVubnkuDQoN
CldoaWNoIHBhdGNoIHlvdSByZWZlcnJpbmcgdG8/IEkgd2lsbCB0cnkgdG8gd29yayBvbiBzdWJt
aXQgaXQgdG8gc3RhYmxlLg0KDQpUaGFua3MNCldleQ0K

2010-07-21 21:00:27

by drago01

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

On Wed, Jul 21, 2010 at 10:37 PM, Guy, Wey-Yi <[email protected]> wrote:
> Hi drago,
> On Wed, 2010-07-21 at 12:59 -0700, drago01 wrote:
>> On Tue, Jul 20, 2010 at 1:56 AM, Guy, Wey-Yi <[email protected]> wrote:
>> > Hi drago,
>> >
>> >
>> > Are you using 5350? here I attach a "RFC patch", could you give a try to
>> > see if it help?
>>
>> Not quite I am on 5300; your patch seem to only touch the 5350 code
>> ... should I try the same change for 5300?
>
> Yes, please

Hi,

As there is no such field in .34 I patched the .35 driver which seems
to be fine with the change ... I couldn't trigger it using the close
lid (no suspend) and wait a bit trick ... but I have not used it for
long enough to say for certain that its gone.

But unfortunately the driver has a different issue it spams my log with tons of:

iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10

messages.

2010-07-24 04:34:46

by Wey-Yi Guy

[permalink] [raw]
Subject: RE: iwlagn and many firmware restarts with Fedora kernel

Hi drago,

-----Original Message-----
From: drago01 [mailto:[email protected]]
Sent: Friday, July 23, 2010 3:32 PM
To: Guy, Wey-Yi W
Cc: Marcel Holtmann; [email protected]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

On Thu, Jul 22, 2010 at 8:37 PM, Guy, Wey-Yi <[email protected]> wrote:
> Hi drago,
>
>
> On Wed, 2010-07-21 at 14:00 -0700, drago01 wrote:
>> On Wed, Jul 21, 2010 at 10:37 PM, Guy, Wey-Yi <[email protected]> wrote:
>> > Hi drago,
>> > On Wed, 2010-07-21 at 12:59 -0700, drago01 wrote:
>> >> On Tue, Jul 20, 2010 at 1:56 AM, Guy, Wey-Yi <[email protected]> wrote:
>> >> > Hi drago,
>> >> >
>> >> >
>> >> > Are you using 5350? here I attach a "RFC patch", could you give a try to
>> >> > see if it help?
>> >>
>> >> Not quite I am on 5300; your patch seem to only touch the 5350 code
>> >> ... should I try the same change for 5300?
>> >
>> > Yes, please
>>
>> Hi,
>>
>> As there is no such field in .34 I patched the .35 driver which seems
>> to be fine with the change ... I couldn't trigger it using the close
>> lid (no suspend) and wait a bit trick ... but I have not used it for
>> long enough to say for certain that its gone.
>>
>> But unfortunately the driver has a different issue it spams my log with tons of:
>>
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>> iwlagn 0000:02:00.0: BA scd_flow 0 does not match txq_id 10
>>
> Yes, I aware of this, this is introduced by a recent patch
> commit d9763a384216336e180399b69461eae37f6c4f54
>
> I will work on a new patch to help address this.

OK, as for the other patch after a day in use I could not trigger the
problem so it does indeed fix it.

Great, thank you very much for testing it. I will prepare the patch for upstream.

Btw, I also have patch to help the other problem you encounter.

Wey

2010-07-19 23:57:16

by Wey-Yi Guy

[permalink] [raw]
Subject: Re: iwlagn and many firmware restarts with Fedora kernel

Hi drago,

On Mon, 2010-07-19 at 16:29 -0700, Marcel Holtmann wrote:
> Hi,
>
> > > so during the last few weeks, I have seen a huge amount of firmware
> > > restarts with my Intel 5350 card and Fedora 13 kernel (2.6.33.6-147).
> > >
> > > iwlagn 0000:03:00.0: low ack count detected, restart firmware
> > > iwlagn 0000:03:00.0: On demand firmware reload
> > > iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting
> > > iwlagn 0000:03:00.0: queue number out of range: 0, must be 10 to 19
> > >
> > > If this happens then I don't see it once, I normally see this 10-20
> > > times and the connectivity is stalled until the cards comes back to
> > > life. I have seen patches that should have fixed this symptom, but they
> > > might be also send to -stable since this is a major hassle.
> > >
> > > The time without connectivity is something around 4-5 minutes or longer
> > > when this happens. Not really funny.
> >
> > This happens here too even with the 2.6.34.1-9.fc13.x86_64 kernel;
> > when this happens reloading the iwlagn module seems to be the only
> > (quick) way to get it back to life.
>
> the quick way is re-loading the module, that is true. Otherwise you have
> to sit it out. It always comes back nicely and starts working again.
>
Are you using 5350? here I attach a "RFC patch", could you give a try to
see if it help?

Regards
Wey


Attachments:
0001-iwlwifi-extend-the-stuck-queue-monitor-timer-for-53.patch (1.67 kB)