2008-10-29 19:19:50

by Tim Gardner

[permalink] [raw]
Subject: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

This commit 'ath5k: Update interrupt masking code' causes stalls under
load when built using compat-wireless on 2.6.27. There is no output from
dmesg, but it occasionally recovers and moves a few more packets before
stalling again. Luis suggested reverting
eb9d4e8399181357cb6f6625ba7f849987432c6c which I did. Now it appears to
work normally.

I also think this patch should be broken into multiple pieces. There
appear to be at least 3 functional changes addressed in this single patch.

rtg
--
Tim Gardner [email protected]


2008-11-02 18:42:25

by Bob Copeland

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, Nov 02, 2008 at 11:10:04AM +0200, Nick Kossifidis wrote:
> Can you test the attached patch ?
>
> It works for me (tested on a AR5413)...
>
> makis linux # iperf -s
> ------------------------------------------------------------
> Server listening on TCP port 5001
> TCP window size: 85.3 KByte (default)
> ------------------------------------------------------------
> [ 4] local 192.168.1.104 port 5001 connected with 192.168.1.10 port 43192
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.0-10.6 sec 21.8 MBytes 17.2 Mbits/sec

This patch works for me (I was seeing the same stalls).

I somehow managed to trigger this after reloading the module:

Nov 2 13:12:05 sludge kernel: ------------[ cut here ]------------
Nov 2 13:12:05 sludge kernel: WARNING: at net/mac80211/main.c:236
ieee80211_hw_config+0x82/0x8c [mac80211]()
Nov 2 13:12:05 sludge kernel: Modules linked in: ath5k af_packet
sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc
acpi_cpufreq fan container nls_utf8 hfsplus dm_crypt dm_mod kvm_intel
kvm fuse sbp2 snd_hda_intel snd_pcm_oss snd_pcm snd_mixer_oss appletouch
hid_apple snd_seq_dummy snd_seq_oss arc4 ecb snd_seq_midi usbhid
snd_rawmidi snd_seq_midi_event mac80211 snd_seq ohci1394 snd_timer
snd_seq_device sr_mod ieee1394 sky2 cfg80211 sg cdrom rtc ehci_hcd
uhci_hcd thermal bitrev crc32 snd snd_page_alloc battery ac processor
button evdev unix [last unloaded: ath5k]
Nov 2 13:12:05 sludge kernel: Pid: 9748, comm: ath5k_pci Tainted: G
W 2.6.28-rc2-wl #19
[...]

But I think that was just a coincidence with the hardware getting
stuck in a bad state -- after reboot it was fine. If not, this at least
papers over the warning :)

diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
index a298e8a..0308165 100644
--- a/drivers/net/wireless/ath5k/base.c
+++ b/drivers/net/wireless/ath5k/base.c
@@ -2796,7 +2796,8 @@ ath5k_config(struct ieee80211_hw *hw, u32 changed)
sc->bintval = conf->beacon_int;
sc->power_level = conf->power_level;

- return ath5k_chan_set(sc, conf->channel);
+ ath5k_chan_set(sc, conf->channel);
+ return 0;
}

static int
--
1.5.4.2.182.gb3092

--
Bob Copeland %% http://www.bobcopeland.com


2008-11-04 07:13:40

by Kalle Valo

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

Johannes Berg <[email protected]> writes:

>> > Also,
>> > that's a bad thing to do in userspace, imho.
>>
>> What should we do with these rare failures then?
>
> print an error message? ignore them? try again?

I would say that try few (three?) times and if it still fails, restart
firmware and push the same settings again to the firmware. There's
very little that mac80211 could do here, driver has to make the
decisions in case like this.

--
Kalle Valo

2008-11-02 08:32:14

by Nick Kossifidis

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

2008/11/1 Luis R. Rodriguez <[email protected]>:
> On Wed, Oct 29, 2008 at 12:19 PM, Tim Gardner <[email protected]> wrote:
>> This commit 'ath5k: Update interrupt masking code' causes stalls under
>> load when built using compat-wireless on 2.6.27. There is no output from
>> dmesg, but it occasionally recovers and moves a few more packets before
>> stalling again. Luis suggested reverting
>> eb9d4e8399181357cb6f6625ba7f849987432c6c which I did. Now it appears to
>> work normally.
>>
>> I also think this patch should be broken into multiple pieces. There
>> appear to be at least 3 functional changes addressed in this single patch.
>
> Agreed, I just tested with the latest wireless-testing and the issue
> is still present. I reviewed the commit but its a big too large to pin
> point the exact issue, my AR5414 goes up but then quickly becomes
> unusable. Nick how about we revert this and you can then split this up
> into a good number of patches to make the review easier and also to be
> able to pin point the exact issue better?
>
> The SHA1 remains the same but just in case the title of the patch is:
>
> ath5k: Update interrupt masking code
>
> With it reverted I'm cruising and can ditch MadWifi completely, which
> is the idea.
>
> Luis
>

Code doesn't introduce many functional changes, we can't eg. patch
only the code that reads interrupt status and not the code that masks
interrupts etc. Also if you compare the code with legacy-hal you'll
see that legacy-hal does mostly the same (we just handle all cases and
have better abstraction on ath5k.h + the RXNOFRM/TXNOFRM trick) + most
of the code is about interrupts we don't enable anyway. Anyway i think
i found the problem (on base.c we don't unmask TXDESC on PIMR), i'll
do a few more tests and come back to you asap ;-)


--
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick

2008-11-02 20:49:18

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, Nov 2, 2008 at 10:41 AM, Bob Copeland <[email protected]> wrote:
> On Sun, Nov 02, 2008 at 11:10:04AM +0200, Nick Kossifidis wrote:
>> Can you test the attached patch ?
>>
>> It works for me (tested on a AR5413)...
>>
>> makis linux # iperf -s
>> ------------------------------------------------------------
>> Server listening on TCP port 5001
>> TCP window size: 85.3 KByte (default)
>> ------------------------------------------------------------
>> [ 4] local 192.168.1.104 port 5001 connected with 192.168.1.10 port 43192
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.0-10.6 sec 21.8 MBytes 17.2 Mbits/sec
>
> This patch works for me (I was seeing the same stalls).
>
> I somehow managed to trigger this after reloading the module:
>
> Nov 2 13:12:05 sludge kernel: ------------[ cut here ]------------
> Nov 2 13:12:05 sludge kernel: WARNING: at net/mac80211/main.c:236
> ieee80211_hw_config+0x82/0x8c [mac80211]()
> Nov 2 13:12:05 sludge kernel: Modules linked in: ath5k af_packet
> sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc
> acpi_cpufreq fan container nls_utf8 hfsplus dm_crypt dm_mod kvm_intel
> kvm fuse sbp2 snd_hda_intel snd_pcm_oss snd_pcm snd_mixer_oss appletouch
> hid_apple snd_seq_dummy snd_seq_oss arc4 ecb snd_seq_midi usbhid
> snd_rawmidi snd_seq_midi_event mac80211 snd_seq ohci1394 snd_timer
> snd_seq_device sr_mod ieee1394 sky2 cfg80211 sg cdrom rtc ehci_hcd
> uhci_hcd thermal bitrev crc32 snd snd_page_alloc battery ac processor
> button evdev unix [last unloaded: ath5k]
> Nov 2 13:12:05 sludge kernel: Pid: 9748, comm: ath5k_pci Tainted: G
> W 2.6.28-rc2-wl #19
> [...]
>
> But I think that was just a coincidence with the hardware getting
> stuck in a bad state -- after reboot it was fine. If not, this at least
> papers over the warning :)
>
> diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
> index a298e8a..0308165 100644
> --- a/drivers/net/wireless/ath5k/base.c
> +++ b/drivers/net/wireless/ath5k/base.c
> @@ -2796,7 +2796,8 @@ ath5k_config(struct ieee80211_hw *hw, u32 changed)
> sc->bintval = conf->beacon_int;
> sc->power_level = conf->power_level;
>
> - return ath5k_chan_set(sc, conf->channel);
> + ath5k_chan_set(sc, conf->channel);
> + return 0;
> }

Not sure I agree with the WARN_ON() if the driver's mac80211 config()
callback fails. In our case when we tune to a different channel we
have to clear any DMA operations first and then we reset the chip.
Reseting the chip can fail for whatever strange hw issue cases. The
patch fixes the complaint but is the complaint sane?

Luis

2008-11-03 07:55:06

by Johannes Berg

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, 2008-11-02 at 23:51 -0800, Luis R. Rodriguez wrote:
> On Sun, Nov 2, 2008 at 11:34 PM, Johannes Berg
> <[email protected]> wrote:
> > On Sun, 2008-11-02 at 23:29 -0800, Luis R. Rodriguez wrote:
> >
> >> >> Not sure I agree with the WARN_ON() if the driver's mac80211 config()
> >> >> callback fails. In our case when we tune to a different channel we
> >> >> have to clear any DMA operations first and then we reset the chip.
> >> >> Reseting the chip can fail for whatever strange hw issue cases. The
> >> >> patch fixes the complaint but is the complaint sane?
> >> >
> >> > What's mac80211 to do when it fails?
> >>
> >> How about a shiny new nl80211 event?
> >
> > And wtf would userspace do?
>
> If multiple configuration attempts fail perhaps stop trying, and
> inform the user?

But you're saying it's "normal" to get this failure, so wouldn't it
always do that sooner or later and always say your hw is broken? Also,
that's a bad thing to do in userspace, imho. Why can it fail here
anyway? hw borked is one obvious case, but it shouldn't happen enough
for this to be a problem yet.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-11-03 07:29:48

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, Nov 2, 2008 at 11:18 PM, Johannes Berg
<[email protected]> wrote:
> On Sun, 2008-11-02 at 12:49 -0800, Luis R. Rodriguez wrote:
>> On Sun, Nov 2, 2008 at 10:41 AM, Bob Copeland <[email protected]> wrote:
>> > On Sun, Nov 02, 2008 at 11:10:04AM +0200, Nick Kossifidis wrote:
>> >> Can you test the attached patch ?
>> >>
>> >> It works for me (tested on a AR5413)...
>> >>
>> >> makis linux # iperf -s
>> >> ------------------------------------------------------------
>> >> Server listening on TCP port 5001
>> >> TCP window size: 85.3 KByte (default)
>> >> ------------------------------------------------------------
>> >> [ 4] local 192.168.1.104 port 5001 connected with 192.168.1.10 port 43192
>> >> [ ID] Interval Transfer Bandwidth
>> >> [ 4] 0.0-10.6 sec 21.8 MBytes 17.2 Mbits/sec
>> >
>> > This patch works for me (I was seeing the same stalls).
>> >
>> > I somehow managed to trigger this after reloading the module:
>> >
>> > Nov 2 13:12:05 sludge kernel: ------------[ cut here ]------------
>> > Nov 2 13:12:05 sludge kernel: WARNING: at net/mac80211/main.c:236
>> > ieee80211_hw_config+0x82/0x8c [mac80211]()
>> > Nov 2 13:12:05 sludge kernel: Modules linked in: ath5k af_packet
>> > sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc
>> > acpi_cpufreq fan container nls_utf8 hfsplus dm_crypt dm_mod kvm_intel
>> > kvm fuse sbp2 snd_hda_intel snd_pcm_oss snd_pcm snd_mixer_oss appletouch
>> > hid_apple snd_seq_dummy snd_seq_oss arc4 ecb snd_seq_midi usbhid
>> > snd_rawmidi snd_seq_midi_event mac80211 snd_seq ohci1394 snd_timer
>> > snd_seq_device sr_mod ieee1394 sky2 cfg80211 sg cdrom rtc ehci_hcd
>> > uhci_hcd thermal bitrev crc32 snd snd_page_alloc battery ac processor
>> > button evdev unix [last unloaded: ath5k]
>> > Nov 2 13:12:05 sludge kernel: Pid: 9748, comm: ath5k_pci Tainted: G
>> > W 2.6.28-rc2-wl #19
>> > [...]
>> >
>> > But I think that was just a coincidence with the hardware getting
>> > stuck in a bad state -- after reboot it was fine. If not, this at least
>> > papers over the warning :)
>> >
>> > diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
>> > index a298e8a..0308165 100644
>> > --- a/drivers/net/wireless/ath5k/base.c
>> > +++ b/drivers/net/wireless/ath5k/base.c
>> > @@ -2796,7 +2796,8 @@ ath5k_config(struct ieee80211_hw *hw, u32 changed)
>> > sc->bintval = conf->beacon_int;
>> > sc->power_level = conf->power_level;
>> >
>> > - return ath5k_chan_set(sc, conf->channel);
>> > + ath5k_chan_set(sc, conf->channel);
>> > + return 0;
>> > }
>>
>> Not sure I agree with the WARN_ON() if the driver's mac80211 config()
>> callback fails. In our case when we tune to a different channel we
>> have to clear any DMA operations first and then we reset the chip.
>> Reseting the chip can fail for whatever strange hw issue cases. The
>> patch fixes the complaint but is the complaint sane?
>
> What's mac80211 to do when it fails?

How about a shiny new nl80211 event?

Luis

2008-11-03 07:51:21

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, Nov 2, 2008 at 11:34 PM, Johannes Berg
<[email protected]> wrote:
> On Sun, 2008-11-02 at 23:29 -0800, Luis R. Rodriguez wrote:
>
>> >> Not sure I agree with the WARN_ON() if the driver's mac80211 config()
>> >> callback fails. In our case when we tune to a different channel we
>> >> have to clear any DMA operations first and then we reset the chip.
>> >> Reseting the chip can fail for whatever strange hw issue cases. The
>> >> patch fixes the complaint but is the complaint sane?
>> >
>> > What's mac80211 to do when it fails?
>>
>> How about a shiny new nl80211 event?
>
> And wtf would userspace do?

If multiple configuration attempts fail perhaps stop trying, and
inform the user?

Luis

2008-11-03 08:31:04

by Johannes Berg

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Mon, 2008-11-03 at 00:26 -0800, Luis R. Rodriguez wrote:

> >> What should we do with these rare failures then?
> >
> > print an error message? ignore them? try again?
>
> I'd prefer a simple error print than a WARN_ON().

Just so you don't get blamed on kerneloops.org? :)

> >> > hw borked is one obvious case, but it shouldn't happen enough
> >> > for this to be a problem yet.
> >>
> >> This I agree with. It is rare, its just possible, right now mac80211
> >> assumes it never will.
> >
> > It's _always_ assumed that by ignoring the return value, now it's just
> > noisy about it because clearly it doesn't like when the driver fails to
> > do what it wants since then the hw and sw states get out of sync. I
> > really don't see what to do other than retry maybe, but that might well
> > be done in the driver instead.
>
> As can be seen from the patch suggested what some drivers will end up
> doing is just ignoring failures but I guess that can be up to the
> drivers to deal with as you are suggesting. I'd be inclined to try to
> disable the device in case of a few failed resets to be specific with
> ath5k. Would mac80211 want to be informed of that through the return
> value? Is the WARN still appropriate?

I think the warning is appropriate, yes, since mac80211 has no concept
of failing hardware configuration. You're free to add such a concept,
but I really don't know what mac80211 would be supposed to do when
things fail, just try to stop/start? panic()? Instruct the user to
reboot? to swap hardware? See?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-11-02 09:10:05

by Nick Kossifidis

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

2008/11/2 Nick Kossifidis <[email protected]>:
> 2008/11/1 Luis R. Rodriguez <[email protected]>:
>> On Wed, Oct 29, 2008 at 12:19 PM, Tim Gardner <[email protected]> wrote:
>>> This commit 'ath5k: Update interrupt masking code' causes stalls under
>>> load when built using compat-wireless on 2.6.27. There is no output from
>>> dmesg, but it occasionally recovers and moves a few more packets before
>>> stalling again. Luis suggested reverting
>>> eb9d4e8399181357cb6f6625ba7f849987432c6c which I did. Now it appears to
>>> work normally.
>>>
>>> I also think this patch should be broken into multiple pieces. There
>>> appear to be at least 3 functional changes addressed in this single patch.
>>
>> Agreed, I just tested with the latest wireless-testing and the issue
>> is still present. I reviewed the commit but its a big too large to pin
>> point the exact issue, my AR5414 goes up but then quickly becomes
>> unusable. Nick how about we revert this and you can then split this up
>> into a good number of patches to make the review easier and also to be
>> able to pin point the exact issue better?
>>
>> The SHA1 remains the same but just in case the title of the patch is:
>>
>> ath5k: Update interrupt masking code
>>
>> With it reverted I'm cruising and can ditch MadWifi completely, which
>> is the idea.
>>
>> Luis
>>
>

Can you test the attached patch ?

It works for me (tested on a AR5413)...

makis linux # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.1.104 port 5001 connected with 192.168.1.10 port 43192
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.6 sec 21.8 MBytes 17.2 Mbits/sec

makis linux # iwconfig
lo no wireless extensions.

eth0 no wireless extensions.

wmaster0 no wireless extensions.

wlan1 IEEE 802.11abg ESSID:"MickFlemm"
Mode:Managed Frequency:2.437 GHz Access Point: 00:1D:7E:AE:95:44
Bit Rate=24 Mb/s Tx-Power=20 dBm
Retry min limit:7 RTS thr:off Fragment thr=2352 B
Encryption key:off
Power Management:off
Link Quality=97/100 Signal level:-60 dBm Noise level=-94 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0





--
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick


Attachments:
(No filename) (2.46 kB)
ath5k-imr.patch (728.00 B)
Download all attachments

2008-11-04 00:15:37

by Reinette Chatre

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Mon, 2008-11-03 at 00:40 -0800, Luis R. Rodriguez wrote:
> On Mon, Nov 3, 2008 at 12:30 AM, Johannes Berg
> <[email protected]> wrote:
> > On Mon, 2008-11-03 at 00:26 -0800, Luis R. Rodriguez wrote:
> >
> >> >> What should we do with these rare failures then?
> >> >
> >> > print an error message? ignore them? try again?
> >>
> >> I'd prefer a simple error print than a WARN_ON().
> >
> > Just so you don't get blamed on kerneloops.org? :)
>
> No because like I said a hw reset can fail and don't think software
> should be blamed. Right now we do that.

fyi ...

We also have to deal with this in iwlwifi now because the driver chooses
not to send commands to the hardware if rfkill is enabled. If this is
the case (command needs to go to hw, but rfkill is enabled) then we
currently return an error ... triggering this warning.

We cannot return success in this case and queuing the command for later
does not make sense.

Reinette



2008-11-03 08:40:02

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Mon, Nov 3, 2008 at 12:30 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2008-11-03 at 00:26 -0800, Luis R. Rodriguez wrote:
>
>> >> What should we do with these rare failures then?
>> >
>> > print an error message? ignore them? try again?
>>
>> I'd prefer a simple error print than a WARN_ON().
>
> Just so you don't get blamed on kerneloops.org? :)

No because like I said a hw reset can fail and don't think software
should be blamed. Right now we do that.

>> >> > hw borked is one obvious case, but it shouldn't happen enough
>> >> > for this to be a problem yet.
>> >>
>> >> This I agree with. It is rare, its just possible, right now mac80211
>> >> assumes it never will.
>> >
>> > It's _always_ assumed that by ignoring the return value, now it's just
>> > noisy about it because clearly it doesn't like when the driver fails to
>> > do what it wants since then the hw and sw states get out of sync. I
>> > really don't see what to do other than retry maybe, but that might well
>> > be done in the driver instead.
>>
>> As can be seen from the patch suggested what some drivers will end up
>> doing is just ignoring failures but I guess that can be up to the
>> drivers to deal with as you are suggesting. I'd be inclined to try to
>> disable the device in case of a few failed resets to be specific with
>> ath5k. Would mac80211 want to be informed of that through the return
>> value? Is the WARN still appropriate?
>
> I think the warning is appropriate, yes, since mac80211 has no concept
> of failing hardware configuration. You're free to add such a concept,
> but I really don't know what mac80211 would be supposed to do when
> things fail, just try to stop/start? panic()? Instruct the user to
> reboot? to swap hardware? See?

Sure, OK then the error will be "deal with".

Luis

2008-11-03 08:26:05

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Mon, Nov 3, 2008 at 12:04 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2008-11-03 at 00:01 -0800, Luis R. Rodriguez wrote:
>
>> > But you're saying it's "normal" to get this failure,
>>
>> No, I'm just sayings its possible, right now mac80211 assumes its not
>> and if it does its because we somehow lied to mac80211 of our
>> capabilities.
>
> Well, yes, sort of.
>
>> > so wouldn't it
>> > always do that sooner or later and always say your hw is broken?
>>
>> Nope
>
> Why not?
>
>> > Also,
>> > that's a bad thing to do in userspace, imho.
>>
>> What should we do with these rare failures then?
>
> print an error message? ignore them? try again?

I'd prefer a simple error print than a WARN_ON().

>> > hw borked is one obvious case, but it shouldn't happen enough
>> > for this to be a problem yet.
>>
>> This I agree with. It is rare, its just possible, right now mac80211
>> assumes it never will.
>
> It's _always_ assumed that by ignoring the return value, now it's just
> noisy about it because clearly it doesn't like when the driver fails to
> do what it wants since then the hw and sw states get out of sync. I
> really don't see what to do other than retry maybe, but that might well
> be done in the driver instead.

As can be seen from the patch suggested what some drivers will end up
doing is just ignoring failures but I guess that can be up to the
drivers to deal with as you are suggesting. I'd be inclined to try to
disable the device in case of a few failed resets to be specific with
ath5k. Would mac80211 want to be informed of that through the return
value? Is the WARN still appropriate?

Luis

2008-11-03 08:01:16

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, Nov 2, 2008 at 11:54 PM, Johannes Berg
<[email protected]> wrote:
> On Sun, 2008-11-02 at 23:51 -0800, Luis R. Rodriguez wrote:
>> On Sun, Nov 2, 2008 at 11:34 PM, Johannes Berg
>> <[email protected]> wrote:
>> > On Sun, 2008-11-02 at 23:29 -0800, Luis R. Rodriguez wrote:
>> >
>> >> >> Not sure I agree with the WARN_ON() if the driver's mac80211 config()
>> >> >> callback fails. In our case when we tune to a different channel we
>> >> >> have to clear any DMA operations first and then we reset the chip.
>> >> >> Reseting the chip can fail for whatever strange hw issue cases. The
>> >> >> patch fixes the complaint but is the complaint sane?
>> >> >
>> >> > What's mac80211 to do when it fails?
>> >>
>> >> How about a shiny new nl80211 event?
>> >
>> > And wtf would userspace do?
>>
>> If multiple configuration attempts fail perhaps stop trying, and
>> inform the user?
>
> But you're saying it's "normal" to get this failure,

No, I'm just sayings its possible, right now mac80211 assumes its not
and if it does its because we somehow lied to mac80211 of our
capabilities.

> so wouldn't it
> always do that sooner or later and always say your hw is broken?

Nope

> Also,
> that's a bad thing to do in userspace, imho.

What should we do with these rare failures then?

> Why can it fail here
> anyway?

Look at all the reasons for which a hw reset can fail for ath9k and ath5k.

> hw borked is one obvious case, but it shouldn't happen enough
> for this to be a problem yet.

This I agree with. It is rare, its just possible, right now mac80211
assumes it never will.

Luis

2008-11-03 08:05:22

by Johannes Berg

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Mon, 2008-11-03 at 00:01 -0800, Luis R. Rodriguez wrote:

> > But you're saying it's "normal" to get this failure,
>
> No, I'm just sayings its possible, right now mac80211 assumes its not
> and if it does its because we somehow lied to mac80211 of our
> capabilities.

Well, yes, sort of.

> > so wouldn't it
> > always do that sooner or later and always say your hw is broken?
>
> Nope

Why not?

> > Also,
> > that's a bad thing to do in userspace, imho.
>
> What should we do with these rare failures then?

print an error message? ignore them? try again?

> > hw borked is one obvious case, but it shouldn't happen enough
> > for this to be a problem yet.
>
> This I agree with. It is rare, its just possible, right now mac80211
> assumes it never will.

It's _always_ assumed that by ignoring the return value, now it's just
noisy about it because clearly it doesn't like when the driver fails to
do what it wants since then the hw and sw states get out of sync. I
really don't see what to do other than retry maybe, but that might well
be done in the driver instead.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-11-03 14:42:36

by Bob Copeland

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Mon, Nov 3, 2008 at 3:26 AM, Luis R. Rodriguez <[email protected]> wrote:
> As can be seen from the patch suggested what some drivers will end up
> doing is just ignoring failures but I guess that can be up to the
> drivers to deal with as you are suggesting.

FWIW my patch returning 0 in every case wasn't to be taken seriously.
However, in my case I got a never-ending series of such warnings which
was not a particularly nice failure scenario as opposed to something
like "ath0: hardware is hosed, disabling." That's certainly something
the driver could do (but then why bother returning anything from the
function?)

--
Bob Copeland %% http://www.bobcopeland.com

2008-11-04 08:38:45

by Tomas Winkler

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Tue, Nov 4, 2008 at 9:12 AM, Kalle Valo <[email protected]> wrote:
> Johannes Berg <[email protected]> writes:
>
>>> > Also,
>>> > that's a bad thing to do in userspace, imho.
>>>
>>> What should we do with these rare failures then?
>>
>> print an error message? ignore them? try again?
>
> I would say that try few (three?) times and if it still fails, restart
> firmware and push the same settings again to the firmware. There's
> very little that mac80211 could do here, driver has to make the
> decisions in case like this.

Mac should be aware that card is iin TODO list.
Tomas

2008-11-03 07:35:26

by Johannes Berg

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, 2008-11-02 at 23:29 -0800, Luis R. Rodriguez wrote:

> >> Not sure I agree with the WARN_ON() if the driver's mac80211 config()
> >> callback fails. In our case when we tune to a different channel we
> >> have to clear any DMA operations first and then we reset the chip.
> >> Reseting the chip can fail for whatever strange hw issue cases. The
> >> patch fixes the complaint but is the complaint sane?
> >
> > What's mac80211 to do when it fails?
>
> How about a shiny new nl80211 event?

And wtf would userspace do?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-11-03 07:18:53

by Johannes Berg

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, 2008-11-02 at 12:49 -0800, Luis R. Rodriguez wrote:
> On Sun, Nov 2, 2008 at 10:41 AM, Bob Copeland <[email protected]> wrote:
> > On Sun, Nov 02, 2008 at 11:10:04AM +0200, Nick Kossifidis wrote:
> >> Can you test the attached patch ?
> >>
> >> It works for me (tested on a AR5413)...
> >>
> >> makis linux # iperf -s
> >> ------------------------------------------------------------
> >> Server listening on TCP port 5001
> >> TCP window size: 85.3 KByte (default)
> >> ------------------------------------------------------------
> >> [ 4] local 192.168.1.104 port 5001 connected with 192.168.1.10 port 43192
> >> [ ID] Interval Transfer Bandwidth
> >> [ 4] 0.0-10.6 sec 21.8 MBytes 17.2 Mbits/sec
> >
> > This patch works for me (I was seeing the same stalls).
> >
> > I somehow managed to trigger this after reloading the module:
> >
> > Nov 2 13:12:05 sludge kernel: ------------[ cut here ]------------
> > Nov 2 13:12:05 sludge kernel: WARNING: at net/mac80211/main.c:236
> > ieee80211_hw_config+0x82/0x8c [mac80211]()
> > Nov 2 13:12:05 sludge kernel: Modules linked in: ath5k af_packet
> > sha256_generic aes_i586 aes_generic cbc loop i915 drm binfmt_misc
> > acpi_cpufreq fan container nls_utf8 hfsplus dm_crypt dm_mod kvm_intel
> > kvm fuse sbp2 snd_hda_intel snd_pcm_oss snd_pcm snd_mixer_oss appletouch
> > hid_apple snd_seq_dummy snd_seq_oss arc4 ecb snd_seq_midi usbhid
> > snd_rawmidi snd_seq_midi_event mac80211 snd_seq ohci1394 snd_timer
> > snd_seq_device sr_mod ieee1394 sky2 cfg80211 sg cdrom rtc ehci_hcd
> > uhci_hcd thermal bitrev crc32 snd snd_page_alloc battery ac processor
> > button evdev unix [last unloaded: ath5k]
> > Nov 2 13:12:05 sludge kernel: Pid: 9748, comm: ath5k_pci Tainted: G
> > W 2.6.28-rc2-wl #19
> > [...]
> >
> > But I think that was just a coincidence with the hardware getting
> > stuck in a bad state -- after reboot it was fine. If not, this at least
> > papers over the warning :)
> >
> > diff --git a/drivers/net/wireless/ath5k/base.c b/drivers/net/wireless/ath5k/base.c
> > index a298e8a..0308165 100644
> > --- a/drivers/net/wireless/ath5k/base.c
> > +++ b/drivers/net/wireless/ath5k/base.c
> > @@ -2796,7 +2796,8 @@ ath5k_config(struct ieee80211_hw *hw, u32 changed)
> > sc->bintval = conf->beacon_int;
> > sc->power_level = conf->power_level;
> >
> > - return ath5k_chan_set(sc, conf->channel);
> > + ath5k_chan_set(sc, conf->channel);
> > + return 0;
> > }
>
> Not sure I agree with the WARN_ON() if the driver's mac80211 config()
> callback fails. In our case when we tune to a different channel we
> have to clear any DMA operations first and then we reset the chip.
> Reseting the chip can fail for whatever strange hw issue cases. The
> patch fixes the complaint but is the complaint sane?

What's mac80211 to do when it fails?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2008-11-02 20:40:35

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Sun, Nov 2, 2008 at 1:10 AM, Nick Kossifidis <[email protected]> wrote:
> 2008/11/2 Nick Kossifidis <[email protected]>:
>> 2008/11/1 Luis R. Rodriguez <[email protected]>:
>>> On Wed, Oct 29, 2008 at 12:19 PM, Tim Gardner <[email protected]> wrote:
>>>> This commit 'ath5k: Update interrupt masking code' causes stalls under
>>>> load when built using compat-wireless on 2.6.27. There is no output from
>>>> dmesg, but it occasionally recovers and moves a few more packets before
>>>> stalling again. Luis suggested reverting
>>>> eb9d4e8399181357cb6f6625ba7f849987432c6c which I did. Now it appears to
>>>> work normally.
>>>>
>>>> I also think this patch should be broken into multiple pieces. There
>>>> appear to be at least 3 functional changes addressed in this single patch.
>>>
>>> Agreed, I just tested with the latest wireless-testing and the issue
>>> is still present. I reviewed the commit but its a big too large to pin
>>> point the exact issue, my AR5414 goes up but then quickly becomes
>>> unusable. Nick how about we revert this and you can then split this up
>>> into a good number of patches to make the review easier and also to be
>>> able to pin point the exact issue better?
>>>
>>> The SHA1 remains the same but just in case the title of the patch is:
>>>
>>> ath5k: Update interrupt masking code
>>>
>>> With it reverted I'm cruising and can ditch MadWifi completely, which
>>> is the idea.
>>>
>>> Luis
>>>
>>
>
> Can you test the attached patch ?
>
> It works for me (tested on a AR5413)...

Thanks, that fixes it.

Luis

2008-11-01 21:26:33

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: wireless-testing commit eb9d4e8399181357cb6f6625ba7f849987432c6c causes stalls

On Wed, Oct 29, 2008 at 12:19 PM, Tim Gardner <[email protected]> wrote:
> This commit 'ath5k: Update interrupt masking code' causes stalls under
> load when built using compat-wireless on 2.6.27. There is no output from
> dmesg, but it occasionally recovers and moves a few more packets before
> stalling again. Luis suggested reverting
> eb9d4e8399181357cb6f6625ba7f849987432c6c which I did. Now it appears to
> work normally.
>
> I also think this patch should be broken into multiple pieces. There
> appear to be at least 3 functional changes addressed in this single patch.

Agreed, I just tested with the latest wireless-testing and the issue
is still present. I reviewed the commit but its a big too large to pin
point the exact issue, my AR5414 goes up but then quickly becomes
unusable. Nick how about we revert this and you can then split this up
into a good number of patches to make the review easier and also to be
able to pin point the exact issue better?

The SHA1 remains the same but just in case the title of the patch is:

ath5k: Update interrupt masking code

With it reverted I'm cruising and can ditch MadWifi completely, which
is the idea.

Luis