2017-12-22 00:12:28

by Flávio Silveira

[permalink] [raw]
Subject: ath9k dropping connections

Good evening,

  I have a wireless router running LEDE Reboot (SNAPSHOT,
r5445-04127f0fec), with kernel 4.9.65 and I'm seeing connections being
dropped without any error in dmesg. To fix I have to restart wifi by
issuing "wifi down; wifi up"

  It was previously running OpenWrt Barrier Breaker and was presenting
the same issue, except it would give me an error similar to this one:
"ath: phy0: Failed to stop TX DMA, queues=0x005!"

  After doing some research it seems this error is related to
environment interference and it was fixed in later kernel versions, the
reason I gave LEDE a try. It makes sense, because this router worked
wonderfully for at least a year before giving this issue.

  The reason I am sending this e-mail is to receive suggestions on how
can I debug it properly, because, as pointed above, dmesg shows no errors.

Regards,
  Flavio Silveira


2017-12-29 19:16:13

by Alexander Wetzel

[permalink] [raw]
Subject: Re: ath9k dropping connections

Hello,

I guess there are quite some potential issues which could cause that.
Since you do not see any error logs one of them could be the WPA rekey IV poisoning one I encountered in the past.

Do you have wpa rekey enabled? If so please check if disabling it fixes the issue for you.
Assuming you are only using a WPA passoword and not EAP you can check if you have rekey enabled from a CLI on the router with the folowing command:
"uci show wireless | grep wpa_pair_rekey"

You have to delete any config statements found by the command and restart the router (or wifi) to apply the changes.

If that helps, at least one STA is sending frames using wrong IV's and tricks the other end into dropping valid frames.
In that case I can offer some options how to work around that, but I'm not aware of a real fix and disabling rekey is still probably the best...

Regards,

Alexander

Am 22.12.2017 um 01:12 schrieb Flávio Silveira:
> Good evening,
>
>   I have a wireless router running LEDE Reboot (SNAPSHOT, r5445-04127f0fec), with kernel 4.9.65 and I'm seeing connections being dropped without any error in dmesg. To fix I have to restart wifi by issuing "wifi down; wifi up"
>
>   It was previously running OpenWrt Barrier Breaker and was presenting the same issue, except it would give me an error similar to this one: "ath: phy0: Failed to stop TX DMA, queues=0x005!"
>
>   After doing some research it seems this error is related to environment interference and it was fixed in later kernel versions, the reason I gave LEDE a try. It makes sense, because this router worked wonderfully for at least a year before giving this issue.
>
>   The reason I am sending this e-mail is to receive suggestions on how can I debug it properly, because, as pointed above, dmesg shows no errors.
>
> Regards,
>   Flavio Silveira

2017-12-22 06:28:47

by Rosen Penev

[permalink] [raw]
Subject: Re: ath9k dropping connections

On Thu, Dec 21, 2017 at 4:12 PM, Fl=C3=A1vio Silveira
<[email protected]> wrote:
> Good evening,
>
> I have a wireless router running LEDE Reboot (SNAPSHOT, r5445-04127f0fe=
c),
> with kernel 4.9.65 and I'm seeing connections being dropped without any
> error in dmesg. To fix I have to restart wifi by issuing "wifi down; wifi
> up"
>
> It was previously running OpenWrt Barrier Breaker and was presenting th=
e
> same issue, except it would give me an error similar to this one: "ath:
> phy0: Failed to stop TX DMA, queues=3D0x005!"
>
>From a quick glance at the code, it used to be implemented as ath_err.
It now seems to be implemented as ath_dbg. In other words, if you turn
on debugging, you should see it.
> After doing some research it seems this error is related to environment
> interference and it was fixed in later kernel versions, the reason I gave
> LEDE a try. It makes sense, because this router worked wonderfully for at
> least a year before giving this issue.
>
> The reason I am sending this e-mail is to receive suggestions on how ca=
n I
> debug it properly, because, as pointed above, dmesg shows no errors.
>
> Regards,
> Flavio Silveira

2017-12-22 19:50:20

by Flávio Silveira

[permalink] [raw]
Subject: Re: ath9k dropping connections



On 22/12/2017 04:28, Rosen Penev wrote:
> On Thu, Dec 21, 2017 at 4:12 PM, Flávio Silveira
> <[email protected]> wrote:
>> Good evening,
>>
>> I have a wireless router running LEDE Reboot (SNAPSHOT, r5445-04127f0fec),
>> with kernel 4.9.65 and I'm seeing connections being dropped without any
>> error in dmesg. To fix I have to restart wifi by issuing "wifi down; wifi
>> up"
>>
>> It was previously running OpenWrt Barrier Breaker and was presenting the
>> same issue, except it would give me an error similar to this one: "ath:
>> phy0: Failed to stop TX DMA, queues=0x005!"
>>
> From a quick glance at the code, it used to be implemented as ath_err.
> It now seems to be implemented as ath_dbg. In other words, if you turn
> on debugging, you should see it.
Thank you Rosen for your quick reply!

  I will check how can I turn on this debugging in LEDE to understand
what is the issue.

Regards,
  Flavio Silveira

2018-01-10 21:39:23

by Flávio Silveira

[permalink] [raw]
Subject: Re: ath9k dropping connections



On 29/12/2017 15:59, Alexander Wetzel wrote:
> Hello,
>
> I guess there are quite some potential issues which could cause that.
> Since you do not see any error logs one of them could be the WPA rekey IV poisoning one I encountered in the past.
>
> Do you have wpa rekey enabled? If so please check if disabling it fixes the issue for you.
> Assuming you are only using a WPA passoword and not EAP you can check if you have rekey enabled from a CLI on the router with the folowing command:
> "uci show wireless | grep wpa_pair_rekey"
>
> You have to delete any config statements found by the command and restart the router (or wifi) to apply the changes.
>
> If that helps, at least one STA is sending frames using wrong IV's and tricks the other end into dropping valid frames.
> In that case I can offer some options how to work around that, but I'm not aware of a real fix and disabling rekey is still probably the best...
>
> Regards,
>
> Alexander

Hello Alexander, thanks for your reply!

  Unfortunately "uci show wireless | grep wpa_pair_rekey" returns
nothing, so I guess this is not the issue.

  I didn't know I was running LEDE snapshot with kernel 4.9.65, since
then I built a LEDE 17.01.4 which has kernel 4.4.92 with ATH Debug
enabled. What I can see is "pending" increases for queue BE (best
effort) when the issue occurs, any other thing I could do to help
understand why this is happening?

  I think I didn't mention in first post that I am using virtual
interfaces, I don't know if it matters.

Regards,
  Flavio Silveira