2008-07-07 18:56:54

by Andrew Lutomirski

[permalink] [raw]
Subject: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

This is wireless-testing c80200cd38c265da90f0d9d031ace84aa56b0453, pulled today.

I have a Lenovo X61s with a physical rfkill switch (the kind that
slides between blocked and unblocked).

If I turn off the hardware rfkill switch (set to block) while
associated, I lose the connection (obviously). If I turn it back on
again (set to unblock), I get a lot of latency (my mouse stops moving
for a second or two) and I sometimes get errors in the syslog like
this:

iwl4965: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.

The card fails to associate afterwards. If I tell network-manager to
turn off wireless, then block rfkill, then wait a few seconds, then
unblock it, then turn nm back on, everything works again.

This seems like at least two different bugs:

- iwl4965 causes latency. This latency issue has been reproducible
every time for me on 2.6.24-ubuntu_something, 2.5.25, wireless-compat
(recent), and current wireless-testing.
- iwl4965 doesn't work right after hard unblocking rfkill.

I'm happy to do further troubleshooting and/or test patches.

--Andy


2008-07-07 20:01:30

by Andrew Lutomirski

[permalink] [raw]
Subject: Re: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

On Mon, Jul 7, 2008 at 3:45 PM, Dan Williams <[email protected]> wrote:
> On Mon, 2008-07-07 at 14:56 -0400, Andrew Lutomirski wrote:
>> This is wireless-testing c80200cd38c265da90f0d9d031ace84aa56b0453, pulled today.
>>
>> I have a Lenovo X61s with a physical rfkill switch (the kind that
>> slides between blocked and unblocked).
>>
>> If I turn off the hardware rfkill switch (set to block) while
>> associated, I lose the connection (obviously). If I turn it back on
>> again (set to unblock), I get a lot of latency (my mouse stops moving
>> for a second or two) and I sometimes get errors in the syslog like
>> this:
>>
>> iwl4965: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.
>>
>> The card fails to associate afterwards. If I tell network-manager to
>> turn off wireless, then block rfkill, then wait a few seconds, then
>> unblock it, then turn nm back on, everything works again.
>>
>> This seems like at least two different bugs:
>>
>> - iwl4965 causes latency. This latency issue has been reproducible
>> every time for me on 2.6.24-ubuntu_something, 2.5.25, wireless-compat
>> (recent), and current wireless-testing.
>> - iwl4965 doesn't work right after hard unblocking rfkill.
>>
>> I'm happy to do further troubleshooting and/or test patches.
>
> It depends on what HAL reports for the killswitch's status. NM asks HAL
> what the radio's rfkill state is every 6 seconds (since we can't get
> event notifications when the state changes until 2.6.27). If HAL
> reports the radio is killed, NM won't touch the device. It sounds like
> something is trying to poke the device before it's ready, which may mean
> that either NM isn't aware you have a killswitch, or HAL isn't correctly
> talking to the iwlwifi killswitch bits (either old-style or new-style
> with Henrique's patches).

OK. But IMO iwl4965 should still avoid freaking out when this happens.

>
> Can you post some logs from /var/log/daemon.log around when flipping the
> killswitch to BLOCKED, waiting about 20 seconds, then flipping it back
> to UNBLOCKED and waiting 10 seconds? That will tell whether NM is
> correctly talking to HAL or not.

Attached. While the connection was failing to happen after
re-enabling the radio, dmesg showed a lot of crap like:

[ 4514.797450] wlan0: authenticated
[ 4514.797450] wlan0: associate with AP 00:11:88:06:72:98
[ 4514.800735] wlan0: RX ReassocResp from 00:11:88:06:72:98
(capab=0x401 status=0 aid=164)
[ 4514.800743] wlan0: associated
[ 4516.834866] wlan0: No ProbeResp from current AP 00:11:88:06:72:98 -
assume out of range
[ 4519.235695] wlan0: no IPv6 routers present
[ 4520.865993] wlan0: authenticate with AP 00:11:88:06:72:98
[ 4520.870660] wlan0: authenticated
[ 4520.870660] wlan0: associate with AP 00:11:88:06:72:98
[ 4520.873867] wlan0: RX ReassocResp from 00:11:88:06:72:98
(capab=0x401 status=0 aid=164)
[ 4520.873867] wlan0: associated
[ 4522.890579] wlan0: No ProbeResp from current AP 00:11:88:06:72:98 -
assume out of range
[ 4526.909697] wlan0: authenticate with AP 00:11:88:06:72:98
[ 4526.917697] wlan0: authenticate with AP 00:11:88:06:72:98
[ 4526.917697] wlan0: authenticated
[ 4526.917697] wlan0: associate with AP 00:11:88:06:72:98
[ 4526.921697] wlan0: RX ReassocResp from 00:11:88:06:72:98
(capab=0x401 status=0 aid=164)
[ 4526.921697] wlan0: associated
[ 4528.953389] wlan0: No ProbeResp from current AP 00:11:88:06:72:98 -
assume out of range
[ 4533.066068] wlan0: authenticate with AP 00:11:88:06:72:98
[ 4533.070907] wlan0: authenticate with AP 00:11:88:06:72:98
[ 4533.070907] wlan0: authenticated
[ 4533.070907] wlan0: associate with AP 00:11:88:06:72:98
[ 4533.074906] wlan0: RX ReassocResp from 00:11:88:06:72:98
(capab=0x401 status=0 aid=164)
[ 4533.074906] wlan0: associated

--Andy

>
> Dan
>
>


Attachments:
(No filename) (3.69 kB)
rfkill.txt (28.39 kB)
Download all attachments

2008-07-08 07:05:32

by Holger Schurig

[permalink] [raw]
Subject: Re: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

> This doesn't look like /var/log/daemon.log...

Dan, he posted something inline and something as attachment.

2008-07-07 22:00:43

by Dan Williams

[permalink] [raw]
Subject: Re: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

On Mon, 2008-07-07 at 16:01 -0400, Andrew Lutomirski wrote:
> On Mon, Jul 7, 2008 at 3:45 PM, Dan Williams <[email protected]> wrote:
> > On Mon, 2008-07-07 at 14:56 -0400, Andrew Lutomirski wrote:
> >> This is wireless-testing c80200cd38c265da90f0d9d031ace84aa56b0453, pulled today.
> >>
> >> I have a Lenovo X61s with a physical rfkill switch (the kind that
> >> slides between blocked and unblocked).
> >>
> >> If I turn off the hardware rfkill switch (set to block) while
> >> associated, I lose the connection (obviously). If I turn it back on
> >> again (set to unblock), I get a lot of latency (my mouse stops moving
> >> for a second or two) and I sometimes get errors in the syslog like
> >> this:
> >>
> >> iwl4965: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.
> >>
> >> The card fails to associate afterwards. If I tell network-manager to
> >> turn off wireless, then block rfkill, then wait a few seconds, then
> >> unblock it, then turn nm back on, everything works again.
> >>
> >> This seems like at least two different bugs:
> >>
> >> - iwl4965 causes latency. This latency issue has been reproducible
> >> every time for me on 2.6.24-ubuntu_something, 2.5.25, wireless-compat
> >> (recent), and current wireless-testing.
> >> - iwl4965 doesn't work right after hard unblocking rfkill.
> >>
> >> I'm happy to do further troubleshooting and/or test patches.
> >
> > It depends on what HAL reports for the killswitch's status. NM asks HAL
> > what the radio's rfkill state is every 6 seconds (since we can't get
> > event notifications when the state changes until 2.6.27). If HAL
> > reports the radio is killed, NM won't touch the device. It sounds like
> > something is trying to poke the device before it's ready, which may mean
> > that either NM isn't aware you have a killswitch, or HAL isn't correctly
> > talking to the iwlwifi killswitch bits (either old-style or new-style
> > with Henrique's patches).
>
> OK. But IMO iwl4965 should still avoid freaking out when this happens.
>
> >
> > Can you post some logs from /var/log/daemon.log around when flipping the
> > killswitch to BLOCKED, waiting about 20 seconds, then flipping it back
> > to UNBLOCKED and waiting 10 seconds? That will tell whether NM is
> > correctly talking to HAL or not.
>
> Attached. While the connection was failing to happen after
> re-enabling the radio, dmesg showed a lot of crap like:
>
> [ 4514.797450] wlan0: authenticated
> [ 4514.797450] wlan0: associate with AP 00:11:88:06:72:98
> [ 4514.800735] wlan0: RX ReassocResp from 00:11:88:06:72:98
> (capab=0x401 status=0 aid=164)
> [ 4514.800743] wlan0: associated
> [ 4516.834866] wlan0: No ProbeResp from current AP 00:11:88:06:72:98 -
> assume out of range

This doesn't look like /var/log/daemon.log...

Dan


2008-07-07 19:46:31

by Dan Williams

[permalink] [raw]
Subject: Re: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

On Mon, 2008-07-07 at 14:56 -0400, Andrew Lutomirski wrote:
> This is wireless-testing c80200cd38c265da90f0d9d031ace84aa56b0453, pulled today.
>
> I have a Lenovo X61s with a physical rfkill switch (the kind that
> slides between blocked and unblocked).
>
> If I turn off the hardware rfkill switch (set to block) while
> associated, I lose the connection (obviously). If I turn it back on
> again (set to unblock), I get a lot of latency (my mouse stops moving
> for a second or two) and I sometimes get errors in the syslog like
> this:
>
> iwl4965: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.
>
> The card fails to associate afterwards. If I tell network-manager to
> turn off wireless, then block rfkill, then wait a few seconds, then
> unblock it, then turn nm back on, everything works again.
>
> This seems like at least two different bugs:
>
> - iwl4965 causes latency. This latency issue has been reproducible
> every time for me on 2.6.24-ubuntu_something, 2.5.25, wireless-compat
> (recent), and current wireless-testing.
> - iwl4965 doesn't work right after hard unblocking rfkill.
>
> I'm happy to do further troubleshooting and/or test patches.

It depends on what HAL reports for the killswitch's status. NM asks HAL
what the radio's rfkill state is every 6 seconds (since we can't get
event notifications when the state changes until 2.6.27). If HAL
reports the radio is killed, NM won't touch the device. It sounds like
something is trying to poke the device before it's ready, which may mean
that either NM isn't aware you have a killswitch, or HAL isn't correctly
talking to the iwlwifi killswitch bits (either old-style or new-style
with Henrique's patches).

Can you post some logs from /var/log/daemon.log around when flipping the
killswitch to BLOCKED, waiting about 20 seconds, then flipping it back
to UNBLOCKED and waiting 10 seconds? That will tell whether NM is
correctly talking to HAL or not.

Dan


2008-07-07 21:39:36

by Tomas Winkler

[permalink] [raw]
Subject: Re: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

On Mon, Jul 7, 2008 at 9:56 PM, Andrew Lutomirski <[email protected]> wrote:
> This is wireless-testing c80200cd38c265da90f0d9d031ace84aa56b0453, pulled today.
>
> I have a Lenovo X61s with a physical rfkill switch (the kind that
> slides between blocked and unblocked).
>
> If I turn off the hardware rfkill switch (set to block) while
> associated, I lose the connection (obviously). If I turn it back on
> again (set to unblock), I get a lot of latency (my mouse stops moving
> for a second or two) and I sometimes get errors in the syslog like
> this:
>
> iwl4965: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.
>
> The card fails to associate afterwards. If I tell network-manager to
> turn off wireless, then block rfkill, then wait a few seconds, then
> unblock it, then turn nm back on, everything works again.
>
> This seems like at least two different bugs:
>
> - iwl4965 causes latency. This latency issue has been reproducible
> every time for me on 2.6.24-ubuntu_something, 2.5.25, wireless-compat
> (recent), and current wireless-testing.
> - iwl4965 doesn't work right after hard unblocking rfkill.
>
> I'm happy to do further troubleshooting and/or test patches.
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
We have few RF kill fixes, just need to rebase them again after latest
Henrique's patches. Hope to post them in next days.
Thanks
Tomas

2008-07-08 16:40:32

by Dan Williams

[permalink] [raw]
Subject: Re: [BUG] iwl4965: huge latency and errors when unblocking the hardware rfkill

On Mon, 2008-07-07 at 16:01 -0400, Andrew Lutomirski wrote:
> On Mon, Jul 7, 2008 at 3:45 PM, Dan Williams <[email protected]> wrote:
> > On Mon, 2008-07-07 at 14:56 -0400, Andrew Lutomirski wrote:
> >> This is wireless-testing c80200cd38c265da90f0d9d031ace84aa56b0453, pulled today.
> >>
> >> I have a Lenovo X61s with a physical rfkill switch (the kind that
> >> slides between blocked and unblocked).
> >>
> >> If I turn off the hardware rfkill switch (set to block) while
> >> associated, I lose the connection (obviously). If I turn it back on
> >> again (set to unblock), I get a lot of latency (my mouse stops moving
> >> for a second or two) and I sometimes get errors in the syslog like
> >> this:
> >>
> >> iwl4965: Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.
> >>
> >> The card fails to associate afterwards. If I tell network-manager to
> >> turn off wireless, then block rfkill, then wait a few seconds, then
> >> unblock it, then turn nm back on, everything works again.
> >>
> >> This seems like at least two different bugs:
> >>
> >> - iwl4965 causes latency. This latency issue has been reproducible
> >> every time for me on 2.6.24-ubuntu_something, 2.5.25, wireless-compat
> >> (recent), and current wireless-testing.
> >> - iwl4965 doesn't work right after hard unblocking rfkill.
> >>
> >> I'm happy to do further troubleshooting and/or test patches.
> >
> > It depends on what HAL reports for the killswitch's status. NM asks HAL
> > what the radio's rfkill state is every 6 seconds (since we can't get
> > event notifications when the state changes until 2.6.27). If HAL
> > reports the radio is killed, NM won't touch the device. It sounds like
> > something is trying to poke the device before it's ready, which may mean
> > that either NM isn't aware you have a killswitch, or HAL isn't correctly
> > talking to the iwlwifi killswitch bits (either old-style or new-style
> > with Henrique's patches).
>
> OK. But IMO iwl4965 should still avoid freaking out when this happens.
>
> >
> > Can you post some logs from /var/log/daemon.log around when flipping the
> > killswitch to BLOCKED, waiting about 20 seconds, then flipping it back
> > to UNBLOCKED and waiting 10 seconds? That will tell whether NM is
> > correctly talking to HAL or not.
>
> Attached. While the connection was failing to happen after
> re-enabling the radio, dmesg showed a lot of crap like:

Ok, so with the patches HAL no longer knows about the killswitch and NM
is unable to poll the switch for the radio state. So NM doesn't stop
trying to poke the radio and connect to stuff because it doesn't know
the radio is off.

Yes, iwlwifi shouldn't freak out. We also need to get HAL fixed up to
recognize the new rfkill stuff as a switch.

Dan