2018-01-26 22:53:46

by Ben Greear

[permalink] [raw]
Subject: ath9k will not tx packets sometimes.

I'm doing a test with 200 virtual stations on each of 6 ath9k radios.

When I configure stations for DHCP, I see cases where stations on a particular
radio will not transmit anything sometimes. I see no 'XMIT' logs that show indication of
frames being received in the driver from the upper stack, but if I use 'tshark' on
a station interface, it shows frames being 'transmitted'.

I do, however, see this, which looks like it might show
an issue. It looks like whatever 'aqm' is, it has an ever expanding number
of backlog packets:

[root@2u-6n lanforge]# cat /debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/aqm
target 49999us interval 299999us ecn no
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets
0 2 27616 440 9 0 0 0 428 998 7
1 3 0 0 0 0 0 0 0 0 0
2 3 0 0 0 0 0 0 0 0 0
3 2 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0 390 1
7 0 0 0 637 0 0 0 792 22072 903
8 2 0 0 0 0 0 0 0 0 0
9 3 0 0 0 0 0 0 0 0 0
10 3 0 0 0 0 0 0 0 0 0
11 2 0 0 0 0 0 0 0 0 0
12 1 0 0 0 0 0 0 0 0 0
13 1 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0


Anyone have any pointers as to whether this might be a real issue or not? I'll go
digging in the meantime....

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2018-01-30 12:07:34

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Ben Greear <[email protected]> writes:

> Maybe there is some way for the scheduler to get stuck and not
> schedule anything?

It would appear so, yeah. Do you do anything special other than
associate a whole bunch of stations to trigger this? I can try to see if
I can script something that works equivalently on my setup.

I'm actually working on reworking that whole scheduler logic, and move
some of it into mac80211. Could you test this (WiP) patch and see if
that has the same problem?

-Toke


Attachments:
mac80211-add-txq-scheduling.patch (28.80 kB)

2018-01-27 13:17:47

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Ben Greear <[email protected]> writes:

> I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
>
> When I configure stations for DHCP, I see cases where stations on a particular
> radio will not transmit anything sometimes. I see no 'XMIT' logs that show indication of
> frames being received in the driver from the upper stack, but if I use 'tshark' on
> a station interface, it shows frames being 'transmitted'.
>
> I do, however, see this, which looks like it might show
> an issue. It looks like whatever 'aqm' is, it has an ever expanding number
> of backlog packets:

The aqm is the intermediate queues in mac80211. So this indicates that
the driver is not pulling packets for transmission.

With that many stations, I wonder whether it is due to the airtime
fairness scheduler throttling the station? What is the contents of
debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/airtime
while the station is not transmitting? And is it all stations on that
particular radio, or only some of them?

-Toke

2018-01-30 20:09:12

by Sebastian Gottschall

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Am 30.01.2018 um 19:55 schrieb Toke Høiland-Jørgensen:
> Ben Greear <[email protected]> writes:
>
>>> I'm actually working on reworking that whole scheduler logic, and move
>>> some of it into mac80211. Could you test this (WiP) patch and see if
>>> that has the same problem?
>> It had some serious conflicts in ath10k, due to my local changes, so
>> I did not actually test this.
> Can send you a version without the ath10k changes tomorrow if you'd like
> to test - but will try to reproduce myself as well...
>
>> But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to
>> have resolved the issue. I'll test more with these reverted, and maybe
>> will have time to work more on actually fixing upstream code next time
>> I move to a newer kernel (and/or after your pending changes get in).
> Ah, that narrows it down some. Well, that is the code I'm hacking on
> currently anyway, so let's see if we can't get it fixed as part of that
> series :)
i have some addition information for you maybe. in the same timeframe i
noticed a increased memory usage for ath9k devices.
maybe that helps. so i hit memory boundaries on embedded devices with
dual interfaces and just 32 mb  ram now which wasnt the case before
is this patch worth to try from my side?

Sebastian

--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz: Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: [email protected]
Tel.: +496251-582650 / Fax: +496251-5826565

2018-01-29 21:34:39

by Ben Greear

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

On 01/27/2018 05:11 AM, Toke H?iland-J?rgensen wrote:
> Ben Greear <[email protected]> writes:
>
>> I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
>>
>> When I configure stations for DHCP, I see cases where stations on a particular
>> radio will not transmit anything sometimes. I see no 'XMIT' logs that show indication of
>> frames being received in the driver from the upper stack, but if I use 'tshark' on
>> a station interface, it shows frames being 'transmitted'.
>>
>> I do, however, see this, which looks like it might show
>> an issue. It looks like whatever 'aqm' is, it has an ever expanding number
>> of backlog packets:
>
> The aqm is the intermediate queues in mac80211. So this indicates that
> the driver is not pulling packets for transmission.
>
> With that many stations, I wonder whether it is due to the airtime
> fairness scheduler throttling the station? What is the contents of
> debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/airtime
> while the station is not transmitting? And is it all stations on that
> particular radio, or only some of them?

Here is the output of airtime and aqm on a hung station:

# cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/airtime
RX: 83706 us
TX: 4202 us
Deficit: VO: 198 us VI: 300 us BE: -8306 us BK: 300 us

# cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/aqm
target 49999us interval 299999us ecn no
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags
0 2 3476 14 6 0 0 0 8 326 3 0x6(RUN AMPDU NO-AMSDU)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
7 0 0 0 4 0 0 0 6 168 7 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)


My tool will effectively down/up the interface after 30 seconds if DHCP has not
been acquired...here is another set of debug after that has happened:

# cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/airtime
RX: 0 us
TX: 3946 us
Deficit: VO: 254 us VI: 300 us BE: 300 us BK: 300 us

# cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/aqm
target 49999us interval 299999us ecn no
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags
0 2 1376 6 2 0 0 0 4 0 0 0x6(RUN AMPDU NO-AMSDU)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
7 0 0 0 12 0 0 0 13 312 13 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

I was able to reproduce this on a system by configuring "only" 200 stations on
a single ath9k radio, so probably the dedicated individual could reproduce
this on their own system as well. Stock kernels are less optimized for this, but at least
for ath9k, it should function with multiple virtual station devices...

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-01-29 21:47:28

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Ben Greear <[email protected]> writes:

> On 01/27/2018 05:11 AM, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
>> Ben Greear <[email protected]> writes:
>>
>>> I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
>>>
>>> When I configure stations for DHCP, I see cases where stations on a par=
ticular
>>> radio will not transmit anything sometimes. I see no 'XMIT' logs that =
show indication of
>>> frames being received in the driver from the upper stack, but if I use =
'tshark' on
>>> a station interface, it shows frames being 'transmitted'.
>>>
>>> I do, however, see this, which looks like it might show
>>> an issue. It looks like whatever 'aqm' is, it has an ever expanding nu=
mber
>>> of backlog packets:
>>
>> The aqm is the intermediate queues in mac80211. So this indicates that
>> the driver is not pulling packets for transmission.
>>
>> With that many stations, I wonder whether it is due to the airtime
>> fairness scheduler throttling the station? What is the contents of
>> debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/=
airtime
>> while the station is not transmitting? And is it all stations on that
>> particular radio, or only some of them?
>
> Here is the output of airtime and aqm on a hung station:
>
> # cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:7=
4\:8a/airtime
> RX: 83706 us
> TX: 4202 us
> Deficit: VO: 198 us VI: 300 us BE: -8306 us BK: 300 us

Right. This looks like incoming traffic is depleting the airtime quantum
faster than it can be replenished by the scheduler, which means that the
station gets completely starved.

Could you try turning off the airtime scheduler?

echo 0 > /sys/kernel/debug/ieee80211/wiphy0/ath9k/airtime_flags

and see if the problem goes away.

If it does, please check if the problem persists when setting
airtime_flags to 1 (which means only include TX airtime).

-Toke

2018-01-29 22:35:55

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Ben Greear <[email protected]> writes:

> On 01/29/2018 01:47 PM, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
>> Ben Greear <[email protected]> writes:
>>
>>> On 01/27/2018 05:11 AM, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
>>>> Ben Greear <[email protected]> writes:
>>>>
>>>>> I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
>>>>>
>>>>> When I configure stations for DHCP, I see cases where stations on a p=
articular
>>>>> radio will not transmit anything sometimes. I see no 'XMIT' logs tha=
t show indication of
>>>>> frames being received in the driver from the upper stack, but if I us=
e 'tshark' on
>>>>> a station interface, it shows frames being 'transmitted'.
>>>>>
>>>>> I do, however, see this, which looks like it might show
>>>>> an issue. It looks like whatever 'aqm' is, it has an ever expanding =
number
>>>>> of backlog packets:
>>>>
>>>> The aqm is the intermediate queues in mac80211. So this indicates that
>>>> the driver is not pulling packets for transmission.
>>>>
>>>> With that many stations, I wonder whether it is due to the airtime
>>>> fairness scheduler throttling the station? What is the contents of
>>>> debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f=
7/airtime
>>>> while the station is not transmitting? And is it all stations on that
>>>> particular radio, or only some of them?
>>>
>>> Here is the output of airtime and aqm on a hung station:
>>>
>>> # cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\=
:74\:8a/airtime
>>> RX: 83706 us
>>> TX: 4202 us
>>> Deficit: VO: 198 us VI: 300 us BE: -8306 us BK: 300 us
>>
>> Right. This looks like incoming traffic is depleting the airtime quantum
>> faster than it can be replenished by the scheduler, which means that the
>> station gets completely starved.
>>
>> Could you try turning off the airtime scheduler?
>>
>> echo 0 > /sys/kernel/debug/ieee80211/wiphy0/ath9k/airtime_flags
>>
>> and see if the problem goes away.
>>
>> If it does, please check if the problem persists when setting
>> airtime_flags to 1 (which means only include TX airtime).
>>
>> -Toke
>>
>
> That did not seem to help:
>
> # cat /debug/ieee80211/wiphy0/netdev\:sta10058/stations/00\:0e\:8e\:50\:7=
4\:8a/node_aggr
> Max-AMPDU: 65535
> MPDU Density: 8
>
>
> TID SEQ_START SEQ_NEXT BAW_SIZE BAW_HEAD BAW_TAIL BAR_IDX SCHED HAS=
-QUED
> 0 0 0 64 0 0 -1 1 =
1

Hmm, SCHED and HAS-QUED are both set, so it should be scheduled. Is the
scheduler maybe simply taking too long to get round to scheduling that
station again?

What happens if you don't kill things after 30 seconds? Is it hanging
forever, or just long enough for your tools to lose patience?

If you have 200 stations all requesting DHCP addresses I could see how
things might take a while...

-Toke

2018-01-30 17:32:25

by Ben Greear

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

On 01/30/2018 04:07 AM, Toke Høiland-Jørgensen wrote:
> Ben Greear <[email protected]> writes:
>
>> Maybe there is some way for the scheduler to get stuck and not
>> schedule anything?
>
> It would appear so, yeah. Do you do anything special other than
> associate a whole bunch of stations to trigger this? I can try to see if
> I can script something that works equivalently on my setup.

Ok, and I'll give you my user-space software package to easily set up
this test case if you want to test with that. Contact me off-list if
you want help setting this up.

>
> I'm actually working on reworking that whole scheduler logic, and move
> some of it into mac80211. Could you test this (WiP) patch and see if
> that has the same problem?

It had some serious conflicts in ath10k, due to my local changes, so
I did not actually test this.

But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to have resolved the
issue. I'll test more with these reverted, and maybe will have time to
work more on actually fixing upstream code next time I move to a newer
kernel (and/or after your pending changes get in).

Thanks,
Ben


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-01-29 21:54:28

by Ben Greear

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

On 01/29/2018 01:47 PM, Toke Høiland-Jørgensen wrote:
> Ben Greear <[email protected]> writes:
>
>> On 01/27/2018 05:11 AM, Toke Høiland-Jørgensen wrote:
>>> Ben Greear <[email protected]> writes:
>>>
>>>> I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
>>>>
>>>> When I configure stations for DHCP, I see cases where stations on a particular
>>>> radio will not transmit anything sometimes. I see no 'XMIT' logs that show indication of
>>>> frames being received in the driver from the upper stack, but if I use 'tshark' on
>>>> a station interface, it shows frames being 'transmitted'.
>>>>
>>>> I do, however, see this, which looks like it might show
>>>> an issue. It looks like whatever 'aqm' is, it has an ever expanding number
>>>> of backlog packets:
>>>
>>> The aqm is the intermediate queues in mac80211. So this indicates that
>>> the driver is not pulling packets for transmission.
>>>
>>> With that many stations, I wonder whether it is due to the airtime
>>> fairness scheduler throttling the station? What is the contents of
>>> debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/airtime
>>> while the station is not transmitting? And is it all stations on that
>>> particular radio, or only some of them?
>>
>> Here is the output of airtime and aqm on a hung station:
>>
>> # cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/airtime
>> RX: 83706 us
>> TX: 4202 us
>> Deficit: VO: 198 us VI: 300 us BE: -8306 us BK: 300 us
>
> Right. This looks like incoming traffic is depleting the airtime quantum
> faster than it can be replenished by the scheduler, which means that the
> station gets completely starved.
>
> Could you try turning off the airtime scheduler?
>
> echo 0 > /sys/kernel/debug/ieee80211/wiphy0/ath9k/airtime_flags
>
> and see if the problem goes away.
>
> If it does, please check if the problem persists when setting
> airtime_flags to 1 (which means only include TX airtime).
>
> -Toke
>

That did not seem to help:

# cat /debug/ieee80211/wiphy0/netdev\:sta10058/stations/00\:0e\:8e\:50\:74\:8a/node_aggr
Max-AMPDU: 65535
MPDU Density: 8


TID SEQ_START SEQ_NEXT BAW_SIZE BAW_HEAD BAW_TAIL BAR_IDX SCHED HAS-QUED
0 0 0 64 0 0 -1 1 1

# cat /debug/ieee80211/wiphy0/netdev\:sta10058/stations/00\:0e\:8e\:50\:74\:8a/airtime
RX: 0 us
TX: 4682 us
Deficit: VO: 300 us VI: 300 us BE: 300 us BK: 300 us

# cat /debug/ieee80211/wiphy0/netdev\:sta10058/stations/00\:0e\:8e\:50\:74\:8a/aqm
target 49999us interval 299999us ecn no
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit collisions tx-bytes tx-packets flags
0 2 2406 11 3 0 0 0 7 0 0 0x6(RUN AMPDU NO-AMSDU)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
7 0 0 0 12 0 0 0 13 312 13 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)


Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-01-30 18:55:28

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Ben Greear <[email protected]> writes:

>> I'm actually working on reworking that whole scheduler logic, and move
>> some of it into mac80211. Could you test this (WiP) patch and see if
>> that has the same problem?
>
> It had some serious conflicts in ath10k, due to my local changes, so
> I did not actually test this.

Can send you a version without the ath10k changes tomorrow if you'd like
to test - but will try to reproduce myself as well...

> But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to
> have resolved the issue. I'll test more with these reverted, and maybe
> will have time to work more on actually fixing upstream code next time
> I move to a newer kernel (and/or after your pending changes get in).

Ah, that narrows it down some. Well, that is the code I'm hacking on
currently anyway, so let's see if we can't get it fixed as part of that
series :)

-Toke

2018-01-31 13:37:47

by Sebastian Gottschall

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Am 31.01.2018 um 12:50 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <[email protected]> writes:
>
>> Am 30.01.2018 um 19:55 schrieb Toke Høiland-Jørgensen:
>>> Ben Greear <[email protected]> writes:
>>>
>>>>> I'm actually working on reworking that whole scheduler logic, and move
>>>>> some of it into mac80211. Could you test this (WiP) patch and see if
>>>>> that has the same problem?
>>>> It had some serious conflicts in ath10k, due to my local changes, so
>>>> I did not actually test this.
>>> Can send you a version without the ath10k changes tomorrow if you'd like
>>> to test - but will try to reproduce myself as well...
>>>
>>>> But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to
>>>> have resolved the issue. I'll test more with these reverted, and maybe
>>>> will have time to work more on actually fixing upstream code next time
>>>> I move to a newer kernel (and/or after your pending changes get in).
>>> Ah, that narrows it down some. Well, that is the code I'm hacking on
>>> currently anyway, so let's see if we can't get it fixed as part of that
>>> series :)
>> i have some addition information for you maybe. in the same timeframe i
>> noticed a increased memory usage for ath9k devices.
>> maybe that helps. so i hit memory boundaries on embedded devices with
>> dual interfaces and just 32 mb  ram now which wasnt the case before
>> is this patch worth to try from my side?
> This is probably because of the added queue space. Which is sort of by
> design. In 3ff23cd5654b9c8f4d567caa73439b4c39fbeaae we lowered the
> default limit for non-VHT devices to 4MB. But if you have several PHYs
> on a very memory constrained device you could still run out I guess.
>
> `echo fq_memory_limit 2097152 > /sys/kernel/debug/ieee80211/phy0/aqm`
> would limit it to 2MB for that phy...
what if i tried that already? :-)

Sebastian
>
> -Toke
>

--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz: Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: [email protected]
Tel.: +496251-582650 / Fax: +496251-5826565

2018-01-31 14:52:35

by Sebastian Gottschall

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Am 31.01.2018 um 14:46 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <[email protected]> writes:
>
>> Am 31.01.2018 um 12:50 schrieb Toke Høiland-Jørgensen:
>>> Sebastian Gottschall <[email protected]> writes:
>>>
>>>> Am 30.01.2018 um 19:55 schrieb Toke Høiland-Jørgensen:
>>>>> Ben Greear <[email protected]> writes:
>>>>>
>>>>>>> I'm actually working on reworking that whole scheduler logic, and move
>>>>>>> some of it into mac80211. Could you test this (WiP) patch and see if
>>>>>>> that has the same problem?
>>>>>> It had some serious conflicts in ath10k, due to my local changes, so
>>>>>> I did not actually test this.
>>>>> Can send you a version without the ath10k changes tomorrow if you'd like
>>>>> to test - but will try to reproduce myself as well...
>>>>>
>>>>>> But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to
>>>>>> have resolved the issue. I'll test more with these reverted, and maybe
>>>>>> will have time to work more on actually fixing upstream code next time
>>>>>> I move to a newer kernel (and/or after your pending changes get in).
>>>>> Ah, that narrows it down some. Well, that is the code I'm hacking on
>>>>> currently anyway, so let's see if we can't get it fixed as part of that
>>>>> series :)
>>>> i have some addition information for you maybe. in the same timeframe i
>>>> noticed a increased memory usage for ath9k devices.
>>>> maybe that helps. so i hit memory boundaries on embedded devices with
>>>> dual interfaces and just 32 mb  ram now which wasnt the case before
>>>> is this patch worth to try from my side?
>>> This is probably because of the added queue space. Which is sort of by
>>> design. In 3ff23cd5654b9c8f4d567caa73439b4c39fbeaae we lowered the
>>> default limit for non-VHT devices to 4MB. But if you have several PHYs
>>> on a very memory constrained device you could still run out I guess.
>>>
>>> `echo fq_memory_limit 2097152 > /sys/kernel/debug/ieee80211/phy0/aqm`
>>> would limit it to 2MB for that phy...
>> what if i tried that already? :-)
> Hmm, then it's maybe a bug? Changing the limit makes no difference at
> all? Does your build include 0bfe649fbb133? What are values of the
maybe it makes a change but i run into oom after a while as well
> counters in /sys/kernel/debug/ieee80211/phy0/aqm ?
i will check the current state in the next days. havent checked it over
the last 2 months on the affected device

>
> -Toke
>

--
Mit freundlichen Grüssen / Regards

Sebastian Gottschall / CTO

NewMedia-NET GmbH - DD-WRT
Firmensitz: Stubenwaldallee 21a, 64625 Bensheim
Registergericht: Amtsgericht Darmstadt, HRB 25473
Geschäftsführer: Peter Steinhäuser, Christian Scheele
http://www.dd-wrt.com
email: [email protected]
Tel.: +496251-582650 / Fax: +496251-5826565

2018-01-31 11:50:22

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Sebastian Gottschall <[email protected]> writes:

> Am 30.01.2018 um 19:55 schrieb Toke H=C3=B8iland-J=C3=B8rgensen:
>> Ben Greear <[email protected]> writes:
>>
>>>> I'm actually working on reworking that whole scheduler logic, and move
>>>> some of it into mac80211. Could you test this (WiP) patch and see if
>>>> that has the same problem?
>>> It had some serious conflicts in ath10k, due to my local changes, so
>>> I did not actually test this.
>> Can send you a version without the ath10k changes tomorrow if you'd like
>> to test - but will try to reproduce myself as well...
>>
>>> But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to
>>> have resolved the issue. I'll test more with these reverted, and maybe
>>> will have time to work more on actually fixing upstream code next time
>>> I move to a newer kernel (and/or after your pending changes get in).
>> Ah, that narrows it down some. Well, that is the code I'm hacking on
>> currently anyway, so let's see if we can't get it fixed as part of that
>> series :)
> i have some addition information for you maybe. in the same timeframe i=20
> noticed a increased memory usage for ath9k devices.
> maybe that helps. so i hit memory boundaries on embedded devices with=20
> dual interfaces and just 32 mb=C2=A0 ram now which wasnt the case before
> is this patch worth to try from my side?

This is probably because of the added queue space. Which is sort of by
design. In 3ff23cd5654b9c8f4d567caa73439b4c39fbeaae we lowered the
default limit for non-VHT devices to 4MB. But if you have several PHYs
on a very memory constrained device you could still run out I guess.

`echo fq_memory_limit 2097152 > /sys/kernel/debug/ieee80211/phy0/aqm`
would limit it to 2MB for that phy...

-Toke

2018-01-29 23:00:34

by Ben Greear

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

On 01/29/2018 02:35 PM, Toke Høiland-Jørgensen wrote:
> Ben Greear <[email protected]> writes:
>
>> On 01/29/2018 01:47 PM, Toke Høiland-Jørgensen wrote:
>>> Ben Greear <[email protected]> writes:
>>>
>>>> On 01/27/2018 05:11 AM, Toke Høiland-Jørgensen wrote:
>>>>> Ben Greear <[email protected]> writes:
>>>>>
>>>>>> I'm doing a test with 200 virtual stations on each of 6 ath9k radios.
>>>>>>
>>>>>> When I configure stations for DHCP, I see cases where stations on a particular
>>>>>> radio will not transmit anything sometimes. I see no 'XMIT' logs that show indication of
>>>>>> frames being received in the driver from the upper stack, but if I use 'tshark' on
>>>>>> a station interface, it shows frames being 'transmitted'.
>>>>>>
>>>>>> I do, however, see this, which looks like it might show
>>>>>> an issue. It looks like whatever 'aqm' is, it has an ever expanding number
>>>>>> of backlog packets:
>>>>>
>>>>> The aqm is the intermediate queues in mac80211. So this indicates that
>>>>> the driver is not pulling packets for transmission.
>>>>>
>>>>> With that many stations, I wonder whether it is due to the airtime
>>>>> fairness scheduler throttling the station? What is the contents of
>>>>> debug/ieee80211/wiphy2/netdev\:sta30194/stations/00\:0e\:8e\:69\:b8\:f7/airtime
>>>>> while the station is not transmitting? And is it all stations on that
>>>>> particular radio, or only some of them?
>>>>
>>>> Here is the output of airtime and aqm on a hung station:
>>>>
>>>> # cat /debug/ieee80211/wiphy0/netdev\:sta10057/stations/00\:0e\:8e\:50\:74\:8a/airtime
>>>> RX: 83706 us
>>>> TX: 4202 us
>>>> Deficit: VO: 198 us VI: 300 us BE: -8306 us BK: 300 us
>>>
>>> Right. This looks like incoming traffic is depleting the airtime quantum
>>> faster than it can be replenished by the scheduler, which means that the
>>> station gets completely starved.
>>>
>>> Could you try turning off the airtime scheduler?
>>>
>>> echo 0 > /sys/kernel/debug/ieee80211/wiphy0/ath9k/airtime_flags
>>>
>>> and see if the problem goes away.
>>>
>>> If it does, please check if the problem persists when setting
>>> airtime_flags to 1 (which means only include TX airtime).
>>>
>>> -Toke
>>>
>>
>> That did not seem to help:
>>
>> # cat /debug/ieee80211/wiphy0/netdev\:sta10058/stations/00\:0e\:8e\:50\:74\:8a/node_aggr
>> Max-AMPDU: 65535
>> MPDU Density: 8
>>
>>
>> TID SEQ_START SEQ_NEXT BAW_SIZE BAW_HEAD BAW_TAIL BAR_IDX SCHED HAS-QUED
>> 0 0 0 64 0 0 -1 1 1
>
> Hmm, SCHED and HAS-QUED are both set, so it should be scheduled. Is the
> scheduler maybe simply taking too long to get round to scheduling that
> station again?
>
> What happens if you don't kill things after 30 seconds? Is it hanging
> forever, or just long enough for your tools to lose patience?
>
> If you have 200 stations all requesting DHCP addresses I could see how
> things might take a while...

I bring them up in groups of 30 or so. I typically see 1-10 of them get
DHCP address, and then it seems that no data frames ever are tx'd again on
any interface on the radio...or at least tx is very rare. Sometimes, all 200 will come
up and pass traffic, but not reliably. Once the system gets in this state,
down/up of the affected station interfaces does not fix it. I have not tried
bouncing all of them at once yet.

I never even see dhcp discovers on the air when sniffing on another machine,
from any interface once it is hung, so it should not be a simple over-busy
network issue.

Maybe there is some way for the scheduler to get stuck and not schedule anything?

Thanks,
Ben

>
> -Toke
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com

2018-01-31 13:46:08

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: ath9k will not tx packets sometimes.

Sebastian Gottschall <[email protected]> writes:

> Am 31.01.2018 um 12:50 schrieb Toke H=C3=B8iland-J=C3=B8rgensen:
>> Sebastian Gottschall <[email protected]> writes:
>>
>>> Am 30.01.2018 um 19:55 schrieb Toke H=C3=B8iland-J=C3=B8rgensen:
>>>> Ben Greear <[email protected]> writes:
>>>>
>>>>>> I'm actually working on reworking that whole scheduler logic, and mo=
ve
>>>>>> some of it into mac80211. Could you test this (WiP) patch and see if
>>>>>> that has the same problem?
>>>>> It had some serious conflicts in ath10k, due to my local changes, so
>>>>> I did not actually test this.
>>>> Can send you a version without the ath10k changes tomorrow if you'd li=
ke
>>>> to test - but will try to reproduce myself as well...
>>>>
>>>>> But, a revert of the atf patches (a6e56d749 and 63fefa050) appear to
>>>>> have resolved the issue. I'll test more with these reverted, and maybe
>>>>> will have time to work more on actually fixing upstream code next time
>>>>> I move to a newer kernel (and/or after your pending changes get in).
>>>> Ah, that narrows it down some. Well, that is the code I'm hacking on
>>>> currently anyway, so let's see if we can't get it fixed as part of that
>>>> series :)
>>> i have some addition information for you maybe. in the same timeframe i
>>> noticed a increased memory usage for ath9k devices.
>>> maybe that helps. so i hit memory boundaries on embedded devices with
>>> dual interfaces and just 32 mb=C2=A0 ram now which wasnt the case before
>>> is this patch worth to try from my side?
>> This is probably because of the added queue space. Which is sort of by
>> design. In 3ff23cd5654b9c8f4d567caa73439b4c39fbeaae we lowered the
>> default limit for non-VHT devices to 4MB. But if you have several PHYs
>> on a very memory constrained device you could still run out I guess.
>>
>> `echo fq_memory_limit 2097152 > /sys/kernel/debug/ieee80211/phy0/aqm`
>> would limit it to 2MB for that phy...
> what if i tried that already? :-)

Hmm, then it's maybe a bug? Changing the limit makes no difference at
all? Does your build include 0bfe649fbb133? What are values of the
counters in /sys/kernel/debug/ieee80211/phy0/aqm ?

-Toke