2009-05-19 22:48:25

by Luis R. Rodriguez

[permalink] [raw]
Subject: ath9k oops - in ath_get_rate+0x468 - Fatal exception in interrupt

I was able to get this oops from my home laptop this morning using
master-2009-05-18 with ath9k on 5 GHz with WPA2 with an 11n AP, HT40
set. I was able to reproduce twice by using wpa_supplicant on the
interface, and then manually generating a scan. The oops would happen
immediately. I figured I'd take it up at the office but I haven't been
able to reproduce. So here are the pictures for those interested. This
indicates there was an oops while in_interrupt().

I was unable to pull anything out of the laptop through netconsole.
The only way I was able to capture the oops was through
CONFIG_FB_VESA=y and a high vesa fb resolution. Its as good as it
gets. According to gdb with my local ath9k.ko (it should be a bit
different than my home machine's) ath_get_rate+0x468 translates to:

(gdb) l *(ath_get_rate+0x468)
0x202f8 is in ath_get_rate (drivers/net/wireless/ath/ath9k/rc.c:744).
739 }
740
741 if (rate > (ath_rc_priv->rate_table_size - 1))
742 rate = ath_rc_priv->rate_table_size - 1;
743
744 ASSERT((rate_table->info[rate].valid &&
745 (ath_rc_priv->ht_cap & WLAN_RC_DS_FLAG)) ||
746 (rate_table->info[rate].valid_single_stream &&
747 !(ath_rc_priv->ht_cap & WLAN_RC_DS_FLAG)));

This seems familiar, I believe I had run into it last but last time I
looked I didn't see where this could have been caused. I'm inclined to
lift this assert, leave a WARN and use the lowest rate for now.

IIRC last I checked my hunch on this was that we don't protect the
private RC data nor do we provide an API for this to drivers. Not sure
what is best here. Johannes -- is this related to the RX race you were
mentioning?

http://bombadil.infradead.org/~mcgrof/oops-img/2009-05-19/01.jpg
http://bombadil.infradead.org/~mcgrof/oops-img/2009-05-19/02.jpg
http://bombadil.infradead.org/~mcgrof/oops-img/2009-05-19/03.jpg
http://bombadil.infradead.org/~mcgrof/oops-img/2009-05-19/04.jpg

Luis


Subject: Re: ath9k oops - in ath_get_rate+0x468 - Fatal exception in interrupt

On Wed, May 20, 2009 at 03:52:29AM +0530, Luis R. Rodriguez wrote:

> (gdb) l *(ath_get_rate+0x468)
> 0x202f8 is in ath_get_rate (drivers/net/wireless/ath/ath9k/rc.c:744).
> 739 }
> 740
> 741 if (rate > (ath_rc_priv->rate_table_size - 1))
> 742 rate = ath_rc_priv->rate_table_size - 1;
> 743
> 744 ASSERT((rate_table->info[rate].valid &&
> 745 (ath_rc_priv->ht_cap & WLAN_RC_DS_FLAG)) ||
> 746 (rate_table->info[rate].valid_single_stream &&
> 747 !(ath_rc_priv->ht_cap & WLAN_RC_DS_FLAG)));
>

Most probably we are using wrong rate table at this moment. This is
possible only when we failt during channel set after completing the
scan. Can you please confirm if you are seeing failures during channel
set after changing this assert into warning?.

Vasanth

2009-05-19 22:25:16

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: ath9k oops - in ath_get_rate+0x468 - Fatal exception in interrupt

On Tue, May 19, 2009 at 3:22 PM, Luis R. Rodriguez <[email protected]> wrote:
> Johannes -- is this related to the RX race you were
> mentioning?

Actually this is during ieee80211_tx.. so never mind.

Luis