2009-02-18 21:32:21

by Anders Eriksson

[permalink] [raw]
Subject: No ProbeResp == dead interface?


Every so often I get in syslog:
wlan0: No ProbeResp from current AP 00:18:39:c0:5f:50 - assume out of range

After that the interface carries no more traffic and the onlly way I've found
to revive the card is to power cycle the machine. /etc/iniit.d/net.wlan
restart does not work, and "ifconfig wlan0 up ; iwlist scan" shows no results.
Is there another way to revive the card?

I'm on 2.6.29-rc3, and the card is:
00:0a.0 Network controller: RaLink RT2500 802.11g Cardbus/mini-PCI (rev 01)
Subsystem: RaLink Device 2560
Flags: bus master, slow devsel, latency 64, IRQ 11
Memory at efffe000 (32-bit, non-prefetchable) [size=8K]
Capabilities: [40] Power Management version 2
Kernel driver in use: rt2500pci
Kernel modules: rt2500pci

/Anders



2009-02-20 07:47:32

by Anders Eriksson

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?


[email protected] said:
> The first thing to try would be to see if rmmod rt61pci; modprobe rt61pci is
> enough to recover it. If it is actually a cardbus card then you could also
> try unplugging and replugging it.

It's a rt2500, so I use that module. Should I use rt61pci?
If memory servers, rmmod/modprobe does not help. It is a PCI card with no loose
parts on it so unplugging/plugging is not really an option.


[email protected] said:
> Neither of those is an ideal solution though. Do you fancy the task of trying
> to diagnose the underlying fault which is likely to require at the very
> least making sure the driver is built with debugfs support enabled and may
> require adding a few printk statements into the kernel modules to reveal
> more details of the behaviour?

No problem. I've never used debugfs, but I kind of know my way around the
kernel. The problem has existed at least since the opensource driver got
merged into mainline. I'm not sure if the closed source one had the same issue.
So bisecting is not really an option.

So, where do I sprinkle the prink's, and what info is useful?

Thanks!
/Anders


2009-02-20 23:07:38

by Adam Baker

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?

On Friday 20 February 2009, Anders Eriksson wrote:
> [email protected] said:
> > The first thing to try would be to see if rmmod rt61pci; modprobe rt61pci
> > is enough to recover it. If it is actually a cardbus card then you could
> > also try unplugging and replugging it.
>
> It's a rt2500, so I use that module. Should I use rt61pci?

No, I'm getting confused with another post. rt2500pci is correct.

> If memory servers, rmmod/modprobe does not help. It is a PCI card with no
> loose parts on it so unplugging/plugging is not really an option.

rmmod / modprobe not working isn't a good sign. After a modprobe there is a
pretty comprehensive reconfigure. When you say you currently have to power
cycle, have you also tried a reboot

>
> [email protected] said:
> > Neither of those is an ideal solution though. Do you fancy the task of
> > trying to diagnose the underlying fault which is likely to require at the
> > very least making sure the driver is built with debugfs support enabled
> > and may require adding a few printk statements into the kernel modules
> > to reveal more details of the behaviour?
>
> No problem. I've never used debugfs, but I kind of know my way around the
> kernel. The problem has existed at least since the opensource driver got
> merged into mainline. I'm not sure if the closed source one had the same
> issue. So bisecting is not really an option.

The first thing to do is check that your kernel includes both
CONFIG_MAC80211_DEBUGFS=y
and
CONFIG_RT2X00_LIB_DEBUGFS=y

then you can mount the debugfs file system somewhere

mount -t debugfs none /mnt_point

and have a look in there under ieee80211/phy0/rt2500pci for first. Probable
the most useful thing there is the queue counters which should let you see if
the card is

a) trying to transmit anything
b) receiving anything

Would be good to know if the legacy driver has the same issue - if it does
that might point to a hardware fault.

>
> So, where do I sprinkle the prink's, and what info is useful?
>

Adam.



2009-02-21 14:55:46

by Anders Eriksson

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?


[email protected] said:
> rmmod / modprobe not working isn't a good sign. After a modprobe there is a
> pretty comprehensive reconfigure. When you say you currently have to power
> cycle, have you also tried a reboot

I'll make another go at rmmod/modprobe at the next event to make sure.


[email protected] said:
> and have a look in there under ieee80211/phy0/rt2500pci for first. Probable
> the most useful thing there is the queue counters which should let you see if
> the card is

> a) trying to transmit anything
> b) receiving anything

Done that. I currently have:

chipset:rt chip: 0201
chipset:rf chip: 0003
chipset:revision:00000004
chipset:
chipset:csr length: 93
chipset:eeprom length: 256
chipset:bbp length: 64
chipset:rf length: 5
dev_flags:0x00000a2f
driver:driver: rt2500pci
driver:version: 2.2.3
driver:compiled: Feb 21 2009 15:31:04
Binary file queue/dump matches
queue/queue:qid count limit length index done crypto
queue/queue:14 0 24 6612 12 0 0
queue/queue:0 22 24 0 22 22 0
queue/queue:1 707 24 0 11 11 0
queue/queue:16 0 1 0 0 0 0
queue/queue:17 0 8 0 0 0 0
register/bbp_offset:0
register/bbp_value:0x12
register/csr_offset:0
register/csr_value:0x00000004
register/eeprom_offset:0
register/eeprom_value:0x2560
register/rf_offset:0
register/rf_value:0x00000000

What should I look out for wrt the queue counters?

> Would be good to know if the legacy driver has the same issue - if it does
> that might point to a hardware fault.

Is that still maintained and working with new kernels? I recall needing to
have different a different config setup for with the legacy driver (other
name, and whatnot), so it's been a while.

This is a "once is a week" bug, so it's hard to say for sure when it's working
correctly. If you really want me to, I could try the legay one for a couple
of weeks...

/Anders



2009-02-21 21:23:25

by Adam Baker

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?

On Saturday 21 February 2009, Anders Eriksson wrote:
>
> What should I look out for wrt the queue counters?
>

The most interesting question would be which (if any) counters continue to
increment after it has failed.

> > Would be good to know if the legacy driver has the same issue - if it
> > does that might point to a hardware fault.
>
> Is that still maintained and working with new kernels? I recall needing to
> have different a different config setup for with the legacy driver (other
> name, and whatnot), so it's been a while.
>

According to http://rt2x00.serialmonkey.com/phpBB/viewtopic.php?f=3&t=5174 it
works on 2.6.27 but a patch still needs writing for 2.6.28 onwards.

> This is a "once is a week" bug, so it's hard to say for sure when it's
> working correctly. If you really want me to, I could try the legay one for
> a couple of weeks...
>

Such low frequency bugs are usually a pain to track down. That also precludes
some of the more intrusive monitoring options that would result in huge logs
if applied for that long.

It's often worth checking dmesg output when it does fail too. Sometimes there
is useful info there that doesn't make it to syslog depending on your log
level settings.

Adam

2009-02-19 23:26:37

by Adam Baker

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?

On Wednesday 18 February 2009, Anders Eriksson wrote:
> Every so often I get in syslog:
> wlan0: No ProbeResp from current AP 00:18:39:c0:5f:50 - assume out of range
>
> After that the interface carries no more traffic and the onlly way I've
> found to revive the card is to power cycle the machine.
> /etc/iniit.d/net.wlan restart does not work, and "ifconfig wlan0 up ;
> iwlist scan" shows no results. Is there another way to revive the card?

The first thing to try would be to see if rmmod rt61pci; modprobe rt61pci is
enough to recover it. If it is actually a cardbus card then you could also
try unplugging and replugging it.

Neither of those is an ideal solution though. Do you fancy the task of trying
to diagnose the underlying fault which is likely to require at the very least
making sure the driver is built with debugfs support enabled and may require
adding a few printk statements into the kernel modules to reveal more details
of the behaviour?

>
> I'm on 2.6.29-rc3, and the card is:
> 00:0a.0 Network controller: RaLink RT2500 802.11g Cardbus/mini-PCI (rev 01)
> Subsystem: RaLink Device 2560
> Flags: bus master, slow devsel, latency 64, IRQ 11
> Memory at efffe000 (32-bit, non-prefetchable) [size=8K]
> Capabilities: [40] Power Management version 2
> Kernel driver in use: rt2500pci
> Kernel modules: rt2500pci
>


2009-03-31 18:18:10

by Anders Eriksson

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?


Looking deeper into this I must confess I was wrong here:

[email protected] said:
> Here's another trace of what happens due to lost ProbeResp. The "count" for
> queue 0 stops to increase after that, fwiw. rmmod/modprobe doesn't help. A
> reboot is necessary.

rmmod/modprobe + manually configuring the itnerface DOES seem to fix the
problem (somehow gentoo's "restart" of the inteface does not. That needs
further investigation)

I looked at the code, and "No ProbeResp" sits too far from the rt2500pci module
for me to find out what to do. How is reassociation normally triggered?

/Anders




2009-03-19 21:57:01

by Adam Baker

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?

On Sunday 08 March 2009, Anders Eriksson wrote:
> So,
> Now it has happened again. :-(
>
> I can conclude that a rmmod/modprobe cycle does not help.
>

Well I can see from those logs that it is only the transmit side that is
failing, some traffic is being seen on the receive side.

Does your hardware have an rfkill button and do you have rfkill support
enabled? I'm wondering if rfkill could be activated by a glitch on the
signal.

Other than that this is going to be tricky to do anything with. The fact that
rmmod doesn't help suggests that there is a reasonable possibility of this
being a hardware issue that the software can't fix (assuming this is a rt2500
issue, not something deeper in the stack). You can't do any very intrusive
monitoring as the system needs to be left too long before it fails, except
possibly by rmmodding the original driver when it fails and insmodding a new
one that contains lots of printk statements in the tx path to see which het
triggered. If trying that approach don't forget to instrument
rt2500pci_interrupt as well as rt2x00mac_tx

Adam

2009-03-08 16:28:53

by Anders Eriksson

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?


So,
Now it has happened again. :-(

I can conclude that a rmmod/modprobe cycle does not help.

Some logs:

At around noon, there was some hickup, from which we recovered:
Mar 8 12:54:30 tippex wlan0: deauthenticated (Reason: 7)
Mar 8 12:54:31 tippex wlan0: direct probe to AP 00:18:39:c0:5f:50 try 1
Mar 8 12:54:31 tippex wlan0 direct probe responded
Mar 8 12:54:31 tippex wlan0: authenticate with AP 00:18:39:c0:5f:50
Mar 8 12:54:31 tippex wlan0: authenticated
Mar 8 12:54:31 tippex wlan0: associate with AP 00:18:39:c0:5f:50
Mar 8 12:54:31 tippex wlan0: RX ReassocResp from 00:18:39:c0:5f:50 (capab=0x431 status=0 aid=1)
Mar 8 12:54:31 tippex wlan0: associated


The, at 15:18, we got the dreaded "No ProbeResp" again. Now I have a debugfs
dump for that time as well:
Mar 8 15:09:01 tippex cron[6326]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:09:02 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:09:02 tippex wlan_queue: 14 0 24 51143 23 0 0
Mar 8 15:09:02 tippex wlan_queue: 0 4233 24 0 9 9 0
Mar 8 15:09:02 tippex wlan_queue: 1 1356551 24 0 23 23 0
Mar 8 15:09:02 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:09:02 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:10:01 tippex cron[6334]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:10:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:10:01 tippex wlan_queue: 14 0 24 51854 14 0 0
Mar 8 15:10:01 tippex wlan_queue: 0 4235 24 0 11 11 0
Mar 8 15:10:01 tippex wlan_queue: 1 1356618 24 0 18 18 0
Mar 8 15:10:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:10:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:11:01 tippex cron[6362]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:11:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:11:01 tippex wlan_queue: 14 0 24 52586 2 0 0
Mar 8 15:11:01 tippex wlan_queue: 0 4236 24 0 12 12 0
Mar 8 15:11:01 tippex wlan_queue: 1 1356712 24 0 16 16 0
Mar 8 15:11:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:11:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:12:01 tippex cron[6379]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:12:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:12:01 tippex wlan_queue: 14 0 24 53363 11 0 0
Mar 8 15:12:01 tippex wlan_queue: 0 4237 24 0 13 13 0
Mar 8 15:12:01 tippex wlan_queue: 1 1356822 24 0 6 6 0
Mar 8 15:12:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:12:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:13:01 tippex cron[6386]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:13:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:13:01 tippex wlan_queue: 14 0 24 54037 13 0 0
Mar 8 15:13:01 tippex wlan_queue: 0 4238 24 0 14 14 0
Mar 8 15:13:01 tippex wlan_queue: 1 1356876 24 0 12 12 0
Mar 8 15:13:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:13:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:14:01 tippex cron[6392]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:14:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:14:01 tippex wlan_queue: 14 0 24 54762 18 0 0
Mar 8 15:14:01 tippex wlan_queue: 0 4239 24 0 15 15 0
Mar 8 15:14:01 tippex wlan_queue: 1 1356949 24 0 13 13 0
Mar 8 15:14:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:14:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:14:18 tippex wlan0: No ProbeResp from current AP 00:18:39:c0:5f:50 - assume out of range
Mar 8 15:15:01 tippex cron[6397]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:15:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:15:01 tippex wlan_queue: 14 0 24 55174 22 0 0
Mar 8 15:15:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:15:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:15:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:15:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:16:01 tippex cron[6404]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:16:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:16:01 tippex wlan_queue: 14 0 24 55791 15 0 0
Mar 8 15:16:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:16:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:16:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:16:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:17:01 tippex cron[6410]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:17:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:17:01 tippex wlan_queue: 14 0 24 56411 11 0 0
Mar 8 15:17:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:17:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:17:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:17:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:18:01 tippex cron[6416]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:18:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:18:01 tippex wlan_queue: 14 0 24 57031 7 0 0
Mar 8 15:18:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:18:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:18:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:18:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:19:01 tippex cron[6422]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:19:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:19:01 tippex wlan_queue: 14 0 24 57644 20 0 0
Mar 8 15:19:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:19:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:19:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:19:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:20:01 tippex cron[6431]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:20:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:20:01 tippex wlan_queue: 14 0 24 58269 21 0 0
Mar 8 15:20:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:20:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:20:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:20:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:21:01 tippex cron[6445]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:21:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:21:01 tippex wlan_queue: 14 0 24 58909 13 0 0
Mar 8 15:21:01 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:21:01 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:21:01 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:21:01 tippex wlan_queue: 17 0 8 0 0 0 0
Mar 8 15:22:01 tippex cron[6451]: (root) CMD (cat /debugfs/ieee80211/phy0/rt2500pci/queue/queue | logger -t wlan_queue)
Mar 8 15:22:01 tippex wlan_queue: qid count limit length index done crypto
Mar 8 15:22:01 tippex wlan_queue: 14 0 24 59538 18 0 0
Mar 8 15:22:02 tippex wlan_queue: 0 4243 24 0 19 19 0
Mar 8 15:22:02 tippex wlan_queue: 1 1356953 24 0 17 17 0
Mar 8 15:22:02 tippex wlan_queue: 16 0 1 0 0 0 0
Mar 8 15:22:02 tippex wlan_queue: 17 0 8 0 0 0 0



Trying to rmmod/modprobe, I end up with:

ar 8 17:07:17 tippex ADDRCONF(NETDEV_UP): wlan0: link is not ready
Mar 8 17:07:18 tippex wlan0: authenticate with AP 00:18:39:c0:5f:50
Mar 8 17:07:18 tippex wlan0: authenticate with AP 00:18:39:c0:5f:50
Mar 8 17:07:18 tippex wlan0: authenticate with AP 00:18:39:c0:5f:50
Mar 8 17:07:18 tippex wlan0: authenticate with AP 00:18:39:c0:5f:50
Mar 8 17:07:18 tippex wlan0: authentication with AP 00:18:39:c0:5f:50 timed out
Mar 8 17:07:28 tippex /etc/init.d/net.wlan0[7602]: Failed to configure wireless for wlan0
Mar 8 17:07:28 tippex wlan0: direct probe to AP 00:18:39:c0:5f:50 try 1
Mar 8 17:07:28 tippex wlan0: direct probe to AP 00:18:39:c0:5f:50 try 1
Mar 8 17:07:28 tippex wlan0: deauthenticating by local choice (reason=3)
Mar 8 17:07:28 tippex /etc/init.d/net.wlan0[7487]: ERROR: net.wlan0 failed to start


The only way to get the interface back to operational status is to
/sbin/reboot the machine.

Any ideas how this can be solved?
/Anders



2009-03-20 06:31:18

by Anders Eriksson

[permalink] [raw]
Subject: Re: [Rt2400-devel] No ProbeResp == dead interface?


[email protected] said:
> Well I can see from those logs that it is only the transmit side that is
> failing, some traffic is being seen on the receive side.

> Does your hardware have an rfkill button and do you have rfkill support
> enabled? I'm wondering if rfkill could be activated by a glitch on the
> signal.

Rfkill is indeed enabled (for no good reason, really). I'll cook a new one
without rfkill and see what happens.


[email protected] said:
> If trying that approach don't forget to instrument rt2500pci_interrupt as
> well as rt2x00mac_tx

Not knowing anything of the wireless code, I have no idea where to start to
look. Are these the best functions to start to look in (i.e. sprinkling printf
in)?

/Anders