2012-11-16 17:41:19

by Chaoxing Lin

[permalink] [raw]
Subject: help: 802.11s bad performance with 802.11n enabled

I set up a 7 node 802.11s mesh network and try to evaluate network performance.

My first test is to evaluate packet loss.
My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.

802.11a channel 40. Clean RF environment, nobody else is on this channel

When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.

However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node.

FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).


Does anyone know what's going on?
Or anyone did 802.11s performance test and can share the test data/setup, etc?


Thanks,

Chaoxing


2012-11-17 09:20:15

by Chun-Yeow Yeoh

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

Hi, Chaoxing

Do you network diagram to explain your setup, all the nodes are able
to talk to each others directly?

What the version of compat-wireless that you are using?

---
Chun-Yeow

2012-12-04 04:41:24

by Thomas Pedersen

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

Hi Chaoxing,

On Mon, Dec 3, 2012 at 6:37 AM, Chaoxing Lin
<[email protected]> wrote:
> After a lot of experiments, here are various problems observed.
>
> 1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
> Is anyone looking at this issue? This issue is now very easy to recreate.
>
> 2. Security Key for peer link and mesh path messed up
>
> For example, in one case,
> Device A cannot ping device B but it can ping device C. And it is seen that telnet
> from device A to device C and from device C it can ping device B.
> This means device A actually can reach device B. But user has to do it manually
> (through a third device)
>
> Below is a "reachable graph" in one of the real scenario.
>
> 147 ----> 115
> ----> 111 ------>103
> ------>104
> ------>113
> ------>115
>
> Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
> But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.
>
> Further analysis peer link between 147 and 104 reveals below.
>
> 147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
> But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
> That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.
>
> I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?
>
>
> 3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability
>
> For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
> It's as stable as running 802.11a only mode.
> So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
> Any one has any idea why?

I just learned BA and BAR frames only have a 16 bit field for
"starting sequence number", while mesh uses 32-bit "mesh sequence
numbers". Try to investigate whether these two counters interact
properly.

Thomas

2012-12-10 19:14:07

by Thomas Pedersen

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

On Mon, Dec 10, 2012 at 7:48 AM, Chaoxing Lin
<[email protected]> wrote:
> TP>
> TP>Are you talking about a different bug?
>
> GY> Hm, may bee, but according to Chaoxing Lin emails there is several bugs which cause performance degradation in 802.11s mode, and symptoms in my case indentical, i get same results as Chaoxing Lin, and seems same throbles, i will make tests what you want anyway and report results.
>
> For easy reference, I summarize the 4 problems I uncovered so far that contribute to in-stability of 7-node 802.11s network.
>
> 1. ath9k "Tx DMA error". Ping packet loss is seen each time "Fail to stop Tx DMA" log is seen. It's NOT the main cause.
>
> 2. authsae or 802.11s kernel problem: The two ends of a peer link get out of sync for whatever reason. One end says, the peer link is "ESTAB" and all 3 keys are in place. While the other end says this peer link is not "ESTAB", no keys installed for the peer.

We recently applied
https://github.com/cozybit/authsae/commit/0e5c65c3f773db820d6cee7b365cd4a70181c72d
which may fix your issue.

> 3. AES-CCM pairwise key sometimes complains packet replay so ping packets are dropped. A kernel key dump in this error case is below. (I overwrote key_key_read() function in debugfs_key.c to dump all info)
>
> Key 362:
> 0xcf393800 AES-CCM Key: 49305a736a8b6d5fcb34057ee6983d44 Pairwise
> Peer MAC: 00:0e:8e:38:36:03
> tx_pn: 000000000000009f
>
>
> rx_pn[ 0]: 0000000d788b rx_pn[ 1]: 000000000000 rx_pn[ 2]: 000000000000
> rx_pn[ 3]: 000000000000
> rx_pn[ 4]: 000000000000 rx_pn[ 5]: 000000000000 rx_pn[ 6]: 000000000000
> rx_pn[ 7]: 000000000000
> rx_pn[ 8]: 000000000000 rx_pn[ 9]: 000000000000 rx_pn[10]: 000000000000
> rx_pn[11]: 000000000000
> rx_pn[12]: 000000000000 rx_pn[13]: 000000000000 rx_pn[14]: 000000000000
> rx_pn[15]: 000000000000
> rx_pn[16]: 000000003580
>
> replays: 11970 icverror: <=======================problem here===========
>
> The worse thing for problem 2 and 3 above is, when it gets into this state, the mpath still stays active. So all packets are still routed to the bad peer link/mpath and will be dropped by peer.

ok. Patches are welcome.

> 4. 802.11n packet aggregation. I believe this is the main problem by the fact that, disabling 802.11n packet aggregation in ath9k driver will make the network stable and problem 2 and 3 are not seen. In other words, problem 2 and 3 may be caused by aggregation (my imagination, aggregation caused certain error condition that is not handled properly, which triggers problem 2 and 3)

And to reproduce you run a simultaneous ping from one node to ~6
others? It will take me a few days to find time to reproduce this, so
any interesting observations you can offer in the mean time would be
helpful.

Thanks,
Thomas

2012-12-03 15:47:37

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled


>On first look in my case disabled aggregation reduces packet loss, link is more reliable, but it's also drop throughput to 15-20 Mbits/sec from about ~50 with aggregation enabled.

Yes. This is the penalty of disabling aggregation. It's just a way to narrow down where is the problem.

Can any expert on 802.11n aggregation and/or 802.11s tell what's going on ?

2012-12-08 03:23:33

by Georgiewskiy Yuriy

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:

??, i try this and report results, can you also test on 2.4 Ggz? as i
understand ch 149 is 802.11a? or this make no sense here?

TP>Hi Chaoxing and Georgiewsky,
TP>
TP>On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <[email protected]> wrote:
TP>> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote [email protected]:
TP>>
TP>> CL>After a lot of experiments, here are various problems observed.
TP>> CL>
TP>> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
TP>> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
TP>>
TP>> In my case it much more than 3%.
TP>
TP>With wireless-testing HEAD (671c924) I made the following observations
TP>with 3 nodes in a mesh using ch. 149 HT20 on AR9280.
TP>
TP>1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
TP>2. a UDP iperf test with two nodes generating traffic shows losses
TP>around 1%. We can observe aggregation taking place in this case.
TP>
TP>Can either of you guys reproduce this with the latest
TP>wireless-testing? Also please CC [email protected] on any
TP>mesh bugs in the future.
TP>
TP>Thanks!
TP>Thomas
TP>--
TP>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
TP>the body of a message to [email protected]
TP>More majordomo info at http://vger.kernel.org/majordomo-info.html
TP>

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE

2012-12-03 19:02:17

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

My test is very simple.

Continuous ping at about 10 ping per node per second.
Ping size 64 bytes


-----Original Message-----
From: Paul Stoaks [mailto:[email protected]]
Sent: Monday, December 03, 2012 1:22 PM
To: 'Georgiewskiy Yuriy'; Chaoxing Lin
Cc: [email protected]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

What kind of traffic are you pushing through (packet sizes?) Are they fixed size, fixed rate, or ...?

Paul


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Georgiewskiy Yuriy
Sent: Monday, December 03, 2012 7:44 AM
To: Chaoxing Lin
Cc: [email protected]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:

CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet
aggregation. The network becomes much better.
CL>CL>It's as stable as running 802.11a only mode.
CL>CL>So it seems that the aggregation plays a big role in in-stability
CL>CL>of
802.11s network with 802.11n.
CL>CL>Any one has any idea why?
CL>
CL>Can you post a patch? i want test this too.
CL>
CL>The change is easy
CL>In ath9k/init.c
CL>Function ath9k_set_hw_capab()
CL>Replace below
CL> if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
CL> hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION; with
CL> hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;

On first look in my case disabled aggregation reduces packet loss, link is more reliable, but it's also drop throughput to 15-20 Mbits/sec from about
~50 with aggregation enabled.

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE


2012-12-03 18:33:26

by Georgiewskiy Yuriy

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 10:21 -0800, Paul Stoaks wrote 'Georgiewskiy Yuriy' and...':

i test with ping -A size 64 and 1500 bytes, and with iperf with default parameters plus
just -c -i1 -t100500

PS>What kind of traffic are you pushing through (packet sizes?) Are they fixed
PS>size, fixed rate, or ...?
PS>
PS>Paul
PS>
PS>
PS>-----Original Message-----
PS>From: [email protected]
PS>[mailto:[email protected]] On Behalf Of Georgiewskiy
PS>Yuriy
PS>Sent: Monday, December 03, 2012 7:44 AM
PS>To: Chaoxing Lin
PS>Cc: [email protected]
PS>Subject: RE: help: 802.11s bad performance with 802.11n enabled
PS>
PS>On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:
PS>
PS>CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet
PS>aggregation. The network becomes much better.
PS>CL>CL>It's as stable as running 802.11a only mode.
PS>CL>CL>So it seems that the aggregation plays a big role in in-stability of
PS>802.11s network with 802.11n.
PS>CL>CL>Any one has any idea why?
PS>CL>
PS>CL>Can you post a patch? i want test this too.
PS>CL>
PS>CL>The change is easy
PS>CL>In ath9k/init.c
PS>CL>Function ath9k_set_hw_capab()
PS>CL>Replace below
PS>CL> if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
PS>CL> hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION; with
PS>CL> hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;
PS>
PS>On first look in my case disabled aggregation reduces packet loss, link is
PS>more reliable, but it's also drop throughput to 15-20 Mbits/sec from about
PS>~50 with aggregation enabled.
PS>
PS>C ????????? With Best Regards
PS>???????????? ????. Georgiewskiy Yuriy
PS>+7 4872 711666 +7 4872 711666
PS>???? +7 4872 711143 fax +7 4872 711143
PS>???????? ??? "?? ?? ??????" IT Service Ltd
PS>http://nkoort.ru http://nkoort.ru
PS>JID: [email protected] JID: [email protected]
PS>YG129-RIPE YG129-RIPE
PS>
PS>--
PS>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
PS>the body of a message to [email protected]
PS>More majordomo info at http://vger.kernel.org/majordomo-info.html
PS>

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE

2012-12-10 15:48:56

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

VFA+DQpUUD5BcmUgeW91IHRhbGtpbmcgYWJvdXQgYSBkaWZmZXJlbnQgYnVnPw0KDQpHWT4gSG0s
IG1heSBiZWUsIGJ1dCBhY2NvcmRpbmcgdG8gQ2hhb3hpbmcgTGluIGVtYWlscyB0aGVyZSBpcyBz
ZXZlcmFsIGJ1Z3Mgd2hpY2ggY2F1c2UgcGVyZm9ybWFuY2UgZGVncmFkYXRpb24gaW4gODAyLjEx
cyBtb2RlLCBhbmQgc3ltcHRvbXMgaW4gbXkgY2FzZSBpbmRlbnRpY2FsLCBpIGdldCBzYW1lIHJl
c3VsdHMgYXMgQ2hhb3hpbmcgTGluLCBhbmQgc2VlbXMgc2FtZSB0aHJvYmxlcywgaSB3aWxsIG1h
a2UgdGVzdHMgd2hhdCB5b3Ugd2FudCBhbnl3YXkgYW5kIHJlcG9ydCByZXN1bHRzLg0KDQpGb3Ig
ZWFzeSByZWZlcmVuY2UsIEkgc3VtbWFyaXplIHRoZSA0IHByb2JsZW1zIEkgdW5jb3ZlcmVkIHNv
IGZhciB0aGF0IGNvbnRyaWJ1dGUgdG8gaW4tc3RhYmlsaXR5IG9mIDctbm9kZSA4MDIuMTFzIG5l
dHdvcmsuDQoNCjEuIGF0aDlrICJUeCBETUEgZXJyb3IiLiBQaW5nIHBhY2tldCBsb3NzIGlzIHNl
ZW4gZWFjaCB0aW1lICJGYWlsIHRvIHN0b3AgVHggRE1BIiBsb2cgaXMgc2Vlbi4gIEl0J3MgTk9U
IHRoZSBtYWluIGNhdXNlLg0KDQoyLiBhdXRoc2FlIG9yIDgwMi4xMXMga2VybmVsIHByb2JsZW06
IFRoZSB0d28gZW5kcyBvZiBhIHBlZXIgbGluayBnZXQgb3V0IG9mIHN5bmMgZm9yIHdoYXRldmVy
IHJlYXNvbi4gT25lIGVuZCBzYXlzLCB0aGUgcGVlciBsaW5rIGlzICJFU1RBQiIgYW5kIGFsbCAz
IGtleXMgYXJlIGluIHBsYWNlLiBXaGlsZSB0aGUgb3RoZXIgZW5kIHNheXMgdGhpcyBwZWVyIGxp
bmsgaXMgbm90ICJFU1RBQiIsIG5vIGtleXMgaW5zdGFsbGVkIGZvciB0aGUgcGVlci4NCg0KMy4g
QUVTLUNDTSBwYWlyd2lzZSBrZXkgc29tZXRpbWVzIGNvbXBsYWlucyBwYWNrZXQgcmVwbGF5IHNv
IHBpbmcgcGFja2V0cyBhcmUgZHJvcHBlZC4gQSBrZXJuZWwga2V5IGR1bXAgaW4gdGhpcyBlcnJv
ciBjYXNlIGlzIGJlbG93LiAoSSBvdmVyd3JvdGUga2V5X2tleV9yZWFkKCkgZnVuY3Rpb24gaW4g
ZGVidWdmc19rZXkuYyB0byBkdW1wIGFsbCBpbmZvKQ0KDQoJS2V5IDM2MjogICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
IA0KCTB4Y2YzOTM4MDAgQUVTLUNDTSBLZXk6IDQ5MzA1YTczNmE4YjZkNWZjYjM0MDU3ZWU2OTgz
ZDQ0ICAgUGFpcndpc2UNCglQZWVyIE1BQzogMDA6MGU6OGU6Mzg6MzY6MDMgIA0KCXR4X3BuOiAw
MDAwMDAwMDAwMDAwMDlmICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICANCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgDQoJcnhfcG5bIDBdOiAwMDAwMDAwZDc4OGIgIHJ4X3BuWyAxXTogMDAwMDAwMDAwMDAwICBy
eF9wblsgMl06IDAwMDAwMDAwMDAwMCANCglyeF9wblsgM106IDAwMDAwMDAwMDAwMCAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQoJcnhfcG5bIDRd
OiAwMDAwMDAwMDAwMDAgIHJ4X3BuWyA1XTogMDAwMDAwMDAwMDAwICByeF9wblsgNl06IDAwMDAw
MDAwMDAwMCANCglyeF9wblsgN106IDAwMDAwMDAwMDAwMCAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQoJcnhfcG5bIDhdOiAwMDAwMDAwMDAwMDAg
IHJ4X3BuWyA5XTogMDAwMDAwMDAwMDAwICByeF9wblsxMF06IDAwMDAwMDAwMDAwMCANCglyeF9w
blsxMV06IDAwMDAwMDAwMDAwMCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgDQoJcnhfcG5bMTJdOiAwMDAwMDAwMDAwMDAgIHJ4X3BuWzEzXTogMDAw
MDAwMDAwMDAwICByeF9wblsxNF06IDAwMDAwMDAwMDAwMCANCglyeF9wblsxNV06IDAwMDAwMDAw
MDAwMCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
DQoJcnhfcG5bMTZdOiAwMDAwMDAwMDM1ODAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgIA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICANCglyZXBsYXlzOiAxMTk3
MCBpY3ZlcnJvcjogIDw9PT09PT09PT09PT09PT09PT09PT09PXByb2JsZW0gaGVyZT09PT09PT09
PT09DQoNClRoZSB3b3JzZSB0aGluZyBmb3IgcHJvYmxlbSAyIGFuZCAzIGFib3ZlIGlzLCB3aGVu
IGl0IGdldHMgaW50byB0aGlzIHN0YXRlLCB0aGUgbXBhdGggc3RpbGwgc3RheXMgYWN0aXZlLiBT
byBhbGwgcGFja2V0cyBhcmUgc3RpbGwgcm91dGVkIHRvIHRoZSBiYWQgcGVlciBsaW5rL21wYXRo
IGFuZCB3aWxsIGJlIGRyb3BwZWQgYnkgcGVlci4NCg0KNC4gODAyLjExbiBwYWNrZXQgYWdncmVn
YXRpb24uIEkgYmVsaWV2ZSB0aGlzIGlzIHRoZSBtYWluIHByb2JsZW0gYnkgdGhlIGZhY3QgdGhh
dCwgZGlzYWJsaW5nIDgwMi4xMW4gcGFja2V0IGFnZ3JlZ2F0aW9uIGluIGF0aDlrIGRyaXZlciB3
aWxsIG1ha2UgdGhlIG5ldHdvcmsgc3RhYmxlIGFuZCBwcm9ibGVtIDIgYW5kIDMgYXJlIG5vdCBz
ZWVuLiBJbiBvdGhlciB3b3JkcywgcHJvYmxlbSAyIGFuZCAzIG1heSBiZSBjYXVzZWQgYnkgYWdn
cmVnYXRpb24gKG15IGltYWdpbmF0aW9uLCBhZ2dyZWdhdGlvbiBjYXVzZWQgY2VydGFpbiBlcnJv
ciBjb25kaXRpb24gdGhhdCBpcyBub3QgaGFuZGxlZCBwcm9wZXJseSwgd2hpY2ggdHJpZ2dlcnMg
cHJvYmxlbSAyIGFuZCAzKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgDQogICAgICAgICAgDQo=

2012-12-08 03:29:17

by Georgiewskiy Yuriy

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

On 2012-12-08 07:23 +0400, Georgiewskiy Yuriy wrote Thomas Pedersen:

GY>On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
GY>
GY>??, i try this and report results, can you also test on 2.4 Ggz? as i
GY>understand ch 149 is 802.11a? or this make no sense here?

and also signal level make sense in my case, i just remove antennas from one of the nodes
in range of 3 meters, it works only with pigtails, signal drops to -70 - -80 dbm, and it's
triggers filed to stop tx dma immediatlly.

GY>
GY>TP>Hi Chaoxing and Georgiewsky,
GY>TP>
GY>TP>On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <[email protected]> wrote:
GY>TP>> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote [email protected]:
GY>TP>>
GY>TP>> CL>After a lot of experiments, here are various problems observed.
GY>TP>> CL>
GY>TP>> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
GY>TP>> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
GY>TP>>
GY>TP>> In my case it much more than 3%.
GY>TP>
GY>TP>With wireless-testing HEAD (671c924) I made the following observations
GY>TP>with 3 nodes in a mesh using ch. 149 HT20 on AR9280.
GY>TP>
GY>TP>1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
GY>TP>2. a UDP iperf test with two nodes generating traffic shows losses
GY>TP>around 1%. We can observe aggregation taking place in this case.
GY>TP>
GY>TP>Can either of you guys reproduce this with the latest
GY>TP>wireless-testing? Also please CC [email protected] on any
GY>TP>mesh bugs in the future.
GY>TP>
GY>TP>Thanks!
GY>TP>Thomas
GY>TP>--
GY>TP>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
GY>TP>the body of a message to [email protected]
GY>TP>More majordomo info at http://vger.kernel.org/majordomo-info.html
GY>TP>
GY>
GY>C ????????? With Best Regards
GY>???????????? ????. Georgiewskiy Yuriy
GY>+7 4872 711666 +7 4872 711666
GY>???? +7 4872 711143 fax +7 4872 711143
GY>???????? ??? "?? ?? ??????" IT Service Ltd
GY>http://nkoort.ru http://nkoort.ru
GY>JID: [email protected] JID: [email protected]
GY>YG129-RIPE YG129-RIPE

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE

2012-12-08 03:17:23

by Thomas Pedersen

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

Hi Chaoxing and Georgiewsky,

On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <[email protected]> wrote:
> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote [email protected]:
>
> CL>After a lot of experiments, here are various problems observed.
> CL>
> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
>
> In my case it much more than 3%.

With wireless-testing HEAD (671c924) I made the following observations
with 3 nodes in a mesh using ch. 149 HT20 on AR9280.

1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
2. a UDP iperf test with two nodes generating traffic shows losses
around 1%. We can observe aggregation taking place in this case.

Can either of you guys reproduce this with the latest
wireless-testing? Also please CC [email protected] on any
mesh bugs in the future.

Thanks!
Thomas

2012-12-03 15:22:20

by Georgiewskiy Yuriy

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:37 -0000, Chaoxing Lin wrote [email protected]:

CL>After a lot of experiments, here are various problems observed.
CL>
CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
CL>Is anyone looking at this issue? This issue is now very easy to recreate.

In my case it much more than 3%.

CL>
CL>2. Security Key for peer link and mesh path messed up
CL>
CL>For example, in one case,
CL>Device A cannot ping device B but it can ping device C. And it is seen that telnet
CL>from device A to device C and from device C it can ping device B.
CL>This means device A actually can reach device B. But user has to do it manually
CL>(through a third device)
CL>
CL>Below is a "reachable graph" in one of the real scenario.
CL>
CL>147 ----> 115
CL> ----> 111 ------>103
CL> ------>104
CL> ------>113
CL> ------>115
CL>
CL>Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
CL>But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.
CL>
CL>Further analysis peer link between 147 and 104 reveals below.
CL>
CL>147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
CL>But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
CL>That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.
CL>
CL>I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?
CL>
CL>
CL>3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability
CL>
CL>For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
CL>It's as stable as running 802.11a only mode.
CL>So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
CL>Any one has any idea why?

Can you post a patch? i want test this too.

CL>
CL>
CL>
CL>-----Original Message-----
CL>From: Chaoxing Lin
CL>Sent: Friday, November 16, 2012 12:41 PM
CL>To: '[email protected]'
CL>Subject: help: 802.11s bad performance with 802.11n enabled
CL>
CL>I set up a 7 node 802.11s mesh network and try to evaluate network performance.
CL>
CL>My first test is to evaluate packet loss.
CL>My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.
CL>
CL>802.11a channel 40. Clean RF environment, nobody else is on this channel
CL>
CL>When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.
CL>
CL>However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node.
CL>
CL>FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).
CL>
CL>
CL>Does anyone know what's going on?
CL>Or anyone did 802.11s performance test and can share the test data/setup, etc?
CL>
CL>
CL>Thanks,
CL>
CL>Chaoxing
CL>--
CL>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
CL>the body of a message to [email protected]
CL>More majordomo info at http://vger.kernel.org/majordomo-info.html
CL>

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE

2012-12-08 03:47:30

by Georgiewskiy Yuriy

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

On 2012-12-07 19:37 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:

TP>On Fri, Dec 7, 2012 at 7:29 PM, Georgiewskiy Yuriy <[email protected]> wrote:
TP>> On 2012-12-08 07:23 +0400, Georgiewskiy Yuriy wrote Thomas Pedersen:
TP>>
TP>> GY>On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
TP>> GY>
TP>> GY>ок, i try this and report results, can you also test on 2.4 Ggz? as i
TP>> GY>understand ch 149 is 802.11a? or this make no sense here?
TP>>
TP>> and also signal level make sense in my case, i just remove antennas from one of the nodes
TP>> in range of 3 meters, it works only with pigtails, signal drops to -70 - -80 dbm, and it's
TP>> triggers filed to stop tx dma immediatlly.
TP>
TP>Are you talking about a different bug?

Hm, may bee, but according to Chaoxing Lin emails there is several bugs which cause
performance degradation in 802.11s mode, and symptoms in my case indentical, i get same results
as Chaoxing Lin, and seems same throbles, i will make tests what you want anyway and report
results.

C уважением With Best Regards
Георгиевский Юрий. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
факс +7 4872 711143 fax +7 4872 711143
Компания ООО "Ай Ти Сервис" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE

2012-12-03 14:56:24

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

CL>For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
CL>It's as stable as running 802.11a only mode.
CL>So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
CL>Any one has any idea why?

Can you post a patch? i want test this too.

The change is easy
In ath9k/init.c
Function ath9k_set_hw_capab()
Replace below
if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION;
with
hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;


2012-12-04 08:03:57

by Adrian Chadd

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

... well, how do you implement aggregation? Aggregation also has a 16
bit sequence space (the 802.11 seqno..)



Adrian

On 3 December 2012 20:35, Thomas Pedersen <[email protected]> wrote:
> Hi Chaoxing,
>
> On Mon, Dec 3, 2012 at 6:37 AM, Chaoxing Lin
> <[email protected]> wrote:
>> After a lot of experiments, here are various problems observed.
>>
>> 1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
>> Is anyone looking at this issue? This issue is now very easy to recreate.
>>
>> 2. Security Key for peer link and mesh path messed up
>>
>> For example, in one case,
>> Device A cannot ping device B but it can ping device C. And it is seen that telnet
>> from device A to device C and from device C it can ping device B.
>> This means device A actually can reach device B. But user has to do it manually
>> (through a third device)
>>
>> Below is a "reachable graph" in one of the real scenario.
>>
>> 147 ----> 115
>> ----> 111 ------>103
>> ------>104
>> ------>113
>> ------>115
>>
>> Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
>> But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.
>>
>> Further analysis peer link between 147 and 104 reveals below.
>>
>> 147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
>> But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
>> That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.
>>
>> I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?
>>
>>
>> 3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability
>>
>> For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
>> It's as stable as running 802.11a only mode.
>> So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
>> Any one has any idea why?
>
> I just learned BA and BAR frames only have a 16 bit field for
> "starting sequence number", while mesh uses 32-bit "mesh sequence
> numbers". Try to investigate whether these two counters interact
> properly.
>
> Thomas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-12-03 14:38:01

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

After a lot of experiments, here are various problems observed.

1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
Is anyone looking at this issue? This issue is now very easy to recreate.

2. Security Key for peer link and mesh path messed up

For example, in one case,
Device A cannot ping device B but it can ping device C. And it is seen that telnet
from device A to device C and from device C it can ping device B.
This means device A actually can reach device B. But user has to do it manually
(through a third device)

Below is a "reachable graph" in one of the real scenario.

147 ----> 115
----> 111 ------>103
------>104
------>113
------>115

Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.

Further analysis peer link between 147 and 104 reveals below.

147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.

I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?


3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability

For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
It's as stable as running 802.11a only mode.
So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
Any one has any idea why?




-----Original Message-----
From: Chaoxing Lin
Sent: Friday, November 16, 2012 12:41 PM
To: '[email protected]'
Subject: help: 802.11s bad performance with 802.11n enabled

I set up a 7 node 802.11s mesh network and try to evaluate network performance.

My first test is to evaluate packet loss.
My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.

802.11a channel 40. Clean RF environment, nobody else is on this channel

When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.

However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node.

FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).


Does anyone know what's going on?
Or anyone did 802.11s performance test and can share the test data/setup, etc?


Thanks,

Chaoxing

2012-12-10 15:23:11

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

Thanks Thomas.

TP> With wireless-testing HEAD (671c924) I made the following observations with 3 nodes in a mesh using ch. 149 HT20 on AR9280.

3 nodes may not be enough to see the problem.

TP> 1. ping -i0.1 does not cause aggregation to take place, and losses are 0%

My tool ping each node (total 7 nodes) at about 10~15 packet/second. Maybe in theory, it should not aggregate.
The fact is with/without aggregation, there is a huge difference on ping loss.

If you are interested in my test tool, it's here.
http://sites.google.com/site/ebaylinkan5709pictures/files-to-share/clinmonitor.gz
It's an executable running on any 32-bit Linux.
I lost the source code for this tool. Only found the executable in my old machine.


TP> 2. a UDP iperf test with two nodes generating traffic shows losses around 1%. We can observe aggregation taking place in this case.

For all the kernel versions I have tested, I did not see a problem with two node 802.11s network.
Before the stability test, I did fairly extensive on two-node throughput tests and did not any problem on overnight test. 150 ~ 220 Mbps TCP throughput (varied on different atheros 11n chipsets)


2012-12-08 03:37:22

by Thomas Pedersen

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

On Fri, Dec 7, 2012 at 7:23 PM, Georgiewskiy Yuriy <[email protected]> wrote:
> On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
>
> ??, i try this and report results, can you also test on 2.4 Ggz? as i
> understand ch 149 is 802.11a? or this make no sense here?

Yes I get similar results on the 2.4Ghz band, and no it shouldn't make
a difference here :)

> TP>Hi Chaoxing and Georgiewsky,
> TP>
> TP>On Mon, Dec 3, 2012 at 6:45 AM, Georgiewskiy Yuriy <[email protected]> wrote:
> TP>> On 2012-12-03 14:37 -0000, Chaoxing Lin wrote [email protected]:
> TP>>
> TP>> CL>After a lot of experiments, here are various problems observed.
> TP>> CL>
> TP>> CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
> TP>> CL>Is anyone looking at this issue? This issue is now very easy to recreate.
> TP>>
> TP>> In my case it much more than 3%.
> TP>
> TP>With wireless-testing HEAD (671c924) I made the following observations
> TP>with 3 nodes in a mesh using ch. 149 HT20 on AR9280.
> TP>
> TP>1. ping -i0.1 does not cause aggregation to take place, and losses are 0%
> TP>2. a UDP iperf test with two nodes generating traffic shows losses
> TP>around 1%. We can observe aggregation taking place in this case.
> TP>
> TP>Can either of you guys reproduce this with the latest
> TP>wireless-testing? Also please CC [email protected] on any
> TP>mesh bugs in the future.
> TP>
> TP>Thanks!
> TP>Thomas
> TP>--
> TP>To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> TP>the body of a message to [email protected]
> TP>More majordomo info at http://vger.kernel.org/majordomo-info.html
> TP>
>
> C ????????? With Best Regards
> ???????????? ????. Georgiewskiy Yuriy
> +7 4872 711666 +7 4872 711666
> ???? +7 4872 711143 fax +7 4872 711143
> ???????? ??? "?? ?? ??????" IT Service Ltd
> http://nkoort.ru http://nkoort.ru
> JID: [email protected] JID: [email protected]
> YG129-RIPE YG129-RIPE

2012-12-10 15:11:41

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled



-----Original Message-----
From: Georgiewskiy Yuriy [mailto:[email protected]]
Sent: Monday, December 03, 2012 9:45 AM
To: Chaoxing Lin
Cc: [email protected]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:37 -0000, Chaoxing Lin wrote [email protected]:

CL>After a lot of experiments, here are various problems observed.
CL>
CL>1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
CL>Is anyone looking at this issue? This issue is now very easy to recreate.

GY>>In my case it much more than 3%.

When I say 3% loss due to "Tx DMA", it's measured by eliminate other factors as much as possible (turning off aggregation, etc.) Below is the number from 3-day continuous test. During this 3-day test, "Fail to stop Tx DMA" happens once a while on various nodes.



CLIN network activity/stability monitoring system

Total targets: 6 Total instability monitored: 3221

192.168.5.103 ICMP Tx 5899087 Rx 5883235 Seq=846 OutOfOrder 427 Pkt loss 15852(0.27%) RTT min/avg/max = 55.0/57.9/11580.0 ms
192.168.5.104 ICMP Tx 5899087 Rx 5721023 Seq=846 OutOfOrder 121213 Pkt loss 178064(3.02%) RTT min/avg/max = 54.0/57.7/9032.0 ms
192.168.5.111 ICMP Tx 5899087 Rx 5726094 Seq=846 OutOfOrder 164950 Pkt loss 172993(2.93%) RTT min/avg/max = 54.0/59.0/9421.0 ms
192.168.5.113 ICMP Tx 5899087 Rx 5894984 Seq=846 OutOfOrder 686 Pkt loss 4103(0.07%) RTT min/avg/max = 54.0/58.0/11524.0 ms
192.168.5.115 ICMP Tx 5899087 Rx 5869967 Seq=846 OutOfOrder 66782 Pkt loss 29120(0.49%) RTT min/avg/max = 54.0/67.7/11801.0 ms
192.168.5.147 ICMP Tx 5899087 Rx 5899086 Seq=846 OutOfOrder 0 Pkt loss 1(0.00%) RTT min/avg/max = 0.0/54.0/110.0 ms



Bad Packets: 0 short pkt, 3217664 not-my-echo, 0 not-echo-reply, 0 unknown sender
Application Starts: Thu Dec 6 13:59:58 2012
Current Time: Mon Dec 10 08:50:34 2012


Notes:
1. ignore the big numbers in "outoforder" column. When ICMP sequence number is about to overflow (65535), if the last few packets (e.g. ICMP sn.65535) get lost, all packets after overflow will be counted as "outoforder".
2. if anyone is interested in the test tool I used in this test, it's here
http://sites.google.com/site/ebaylinkan5709pictures/files-to-share/clinmonitor.gz
It's an executable running on any 32-bit Linux.


2012-12-03 15:43:42

by Georgiewskiy Yuriy

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:

CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
CL>CL>It's as stable as running 802.11a only mode.
CL>CL>So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
CL>CL>Any one has any idea why?
CL>
CL>Can you post a patch? i want test this too.
CL>
CL>The change is easy
CL>In ath9k/init.c
CL>Function ath9k_set_hw_capab()
CL>Replace below
CL> if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
CL> hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION;
CL>with
CL> hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;

On first look in my case disabled aggregation reduces packet loss,
link is more reliable, but it's also drop throughput to 15-20 Mbits/sec
from about ~50 with aggregation enabled.

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE

2012-12-03 18:21:59

by Paul Stoaks

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

What kind of traffic are you pushing through (packet sizes?) Are they fixed
size, fixed rate, or ...?

Paul


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Georgiewskiy
Yuriy
Sent: Monday, December 03, 2012 7:44 AM
To: Chaoxing Lin
Cc: [email protected]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

On 2012-12-03 14:56 -0000, Chaoxing Lin wrote Georgiewskiy Yuriy:

CL>CL>For experiment, I changed ath9k driver to disable 802.11n packet
aggregation. The network becomes much better.
CL>CL>It's as stable as running 802.11a only mode.
CL>CL>So it seems that the aggregation plays a big role in in-stability of
802.11s network with 802.11n.
CL>CL>Any one has any idea why?
CL>
CL>Can you post a patch? i want test this too.
CL>
CL>The change is easy
CL>In ath9k/init.c
CL>Function ath9k_set_hw_capab()
CL>Replace below
CL> if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_HT)
CL> hw->flags |= IEEE80211_HW_AMPDU_AGGREGATION; with
CL> hw->flags &= ~IEEE80211_HW_AMPDU_AGGREGATION;

On first look in my case disabled aggregation reduces packet loss, link is
more reliable, but it's also drop throughput to 15-20 Mbits/sec from about
~50 with aggregation enabled.

C ????????? With Best Regards
???????????? ????. Georgiewskiy Yuriy
+7 4872 711666 +7 4872 711666
???? +7 4872 711143 fax +7 4872 711143
???????? ??? "?? ?? ??????" IT Service Ltd
http://nkoort.ru http://nkoort.ru
JID: [email protected] JID: [email protected]
YG129-RIPE YG129-RIPE


2012-12-03 16:33:34

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled

A 4th problem is AES-CCM pairwise key complains about "packet replay".
All keys are good, mpaths are good. But ping does not reply and counter in /debugfs/ieee80211/phy0/keys/[key-num]/replays keeps going up.

I have seen such problem many times when running 802.11s traffic.


-----Original Message-----
From: Chaoxing Lin
Sent: Monday, December 03, 2012 9:38 AM
To: '[email protected]'
Subject: RE: help: 802.11s bad performance with 802.11n enabled

After a lot of experiments, here are various problems observed.

1. The "Fail to stop Tx DMA" related issue plays a role. But not the major part. It accounts for about 3% of packet loss in my testbed.
Is anyone looking at this issue? This issue is now very easy to recreate.

2. Security Key for peer link and mesh path messed up

For example, in one case,
Device A cannot ping device B but it can ping device C. And it is seen that telnet from device A to device C and from device C it can ping device B.
This means device A actually can reach device B. But user has to do it manually (through a third device)

Below is a "reachable graph" in one of the real scenario.

147 ----> 115
----> 111 ------>103
------>104
------>113
------>115

Device 147 can only ping 115 and 111, although its mpath table says it has direct mpath to every node.
But a telnet session from 147 to 111 can ping the rest devices 103, 104, 113, 115.

Further analysis peer link between 147 and 104 reveals below.

147 has peer link to 104 in "ESTAB" and has all 3 keys (CCM pairwise, CMAC group key, CCM group key) installed for peer 104.
But 104 has peer link to 147 in "LISTEN" and it does not have any keys installed for 147.
That is to say, the peer link between 147 and 104 is bad. The worse thing is the mpath table on 147 keep saying the path to 104 is active. So all packets to 104 are sent to this peer link, but could not be decrypted on the other end.

I run meshd-nl80211 compiled from auth-sae for the encryption. Does anyone know what's the problem here? Is this a protocol defect, e.g. failure to cover certain error condition? Or is it auth-sae/kernel implementation bug?


3. 802.11n packet aggregation plays a big role in 802.11s mesh network in-stability

For experiment, I changed ath9k driver to disable 802.11n packet aggregation. The network becomes much better.
It's as stable as running 802.11a only mode.
So it seems that the aggregation plays a big role in in-stability of 802.11s network with 802.11n.
Any one has any idea why?




-----Original Message-----
From: Chaoxing Lin
Sent: Friday, November 16, 2012 12:41 PM
To: '[email protected]'
Subject: help: 802.11s bad performance with 802.11n enabled

I set up a 7 node 802.11s mesh network and try to evaluate network performance.

My first test is to evaluate packet loss.
My test utility is very simple. Do a continuous ping to all 7 nodes. And count the ping replies. The ping rate is about 10 ping requests per second to each node.

802.11a channel 40. Clean RF environment, nobody else is on this channel

When 802.11n is NOT enabled, the ping loss rate is very good. Only a few packets are lost during an overnight test.

However, when 802.11n (HT40+ or HT20) is enabled, the network is crazily unstable. The ping loss is about 30% or more to each node.

FYI, The 802.11n itself seems to work well with 802.11s when there are only 2 nodes (standalone). I say so because I did throughput test on a 2 node mesh with channel 40 HT40+. The throughput was good. IPERF TCP throughput is about 170Mbps out of 300Mbps (2 streams).


Does anyone know what's going on?
Or anyone did 802.11s performance test and can share the test data/setup, etc?


Thanks,

Chaoxing

2012-12-10 19:28:12

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled


> 4. 802.11n packet aggregation. I believe this is the main problem by
> the fact that, disabling 802.11n packet aggregation in ath9k driver
> will make the network stable and problem 2 and 3 are not seen. In
> other words, problem 2 and 3 may be caused by aggregation (my
> imagination, aggregation caused certain error condition that is not
> handled properly, which triggers problem 2 and 3)

TP> And to reproduce you run a simultaneous ping from one node to ~6 others? It will take me a few days to find time to reproduce this, so any interesting observations you can offer in the mean time would be helpful.

Yes, I run simultaneous ping from one node to all other 6 nodes.

No, when 802.11n is enabled, the ping loss is seen fairly fast (in a few minutes). Problem 2 and problem 3 is not that predictable. But once it's in that state, it stucks there and give me enough time to troubleshoot.

I post test result of test running a few days just to show that disabling 802.11n really make the network stable, instead of "stable by chance in a short period.

2012-12-08 03:37:44

by Thomas Pedersen

[permalink] [raw]
Subject: Re: help: 802.11s bad performance with 802.11n enabled

On Fri, Dec 7, 2012 at 7:29 PM, Georgiewskiy Yuriy <[email protected]> wrote:
> On 2012-12-08 07:23 +0400, Georgiewskiy Yuriy wrote Thomas Pedersen:
>
> GY>On 2012-12-07 19:17 -0800, Thomas Pedersen wrote Georgiewskiy Yuriy:
> GY>
> GY>ок, i try this and report results, can you also test on 2.4 Ggz? as i
> GY>understand ch 149 is 802.11a? or this make no sense here?
>
> and also signal level make sense in my case, i just remove antennas from one of the nodes
> in range of 3 meters, it works only with pigtails, signal drops to -70 - -80 dbm, and it's
> triggers filed to stop tx dma immediatlly.

Are you talking about a different bug?

Thomas

2013-01-17 16:26:24

by Chaoxing Lin

[permalink] [raw]
Subject: RE: help: 802.11s bad performance with 802.11n enabled


TP> We recently applied
TP> https://github.com/cozybit/authsae/commit/0e5c65c3f773db820d6cee7b365cd4a70181c72d
which may fix your issue.

All, I just find that the patch above introduce a segmentation fault.

Below is the patch content. Look at line 970, "cand->state" would dereference a NULL pointer because the "if" statement makes sure "cand" is NULL.



if ((cand = find_peer(mgmt->sa, 0)) == NULL) {

968

- sae_debug(AMPE_DEBUG_FSM, "Mesh plink: plink open from unauthed peer\n");

967

+ /* "1" here means only get peers in SAE_ACCEPTED */

968

+ if ((cand = find_peer(mgmt->sa, 1)) == NULL) {

969

+ sae_debug(AMPE_DEBUG_FSM, "Mesh plink: plink open from unauthed peer "MACSTR" state=%d\n",

970

+ MAC2STR(mgmt->sa), cand->state);

969 971

return 0;

970 972

}