2018-02-02 15:11:14

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: [PATCH] mac80211: Adjust TSQ pacing shift

Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.

Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):

sk_pacing_shift 10: 43.21 Mbps (pre-patch)
sk_pacing_shift 9: 78.17 Mbps
sk_pacing_shift 8: 123.94 Mbps
sk_pacing_shift 7: 128.31 Mbps

Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
net/mac80211/tx.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 25904af38839..69722504e3e1 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
if (!IS_ERR_OR_NULL(sta)) {
struct ieee80211_fast_tx *fast_tx;

+ /* We need a bit of data queued to build aggregates properly, so
+ * instruct the TCP stack to allow more than a single ms of data
+ * to be queued in the stack. The value is a bit-shift of 1
+ * second, so 8 is ~4ms of queued data. Only affects local TCP
+ * sockets.
+ */
+ sk_pacing_shift_update(skb->sk, 8);
+
fast_tx = rcu_dereference(sta->fast_tx);

if (fast_tx &&
--
2.16.0


2018-02-02 17:01:46

by David P. Reed

[permalink] [raw]
Subject: RE: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

I'm curious about the "WiFi Aware" initiative by the WiFi Alliance.

Does LEDE and/or Linux support this protocol? I know gSupplicant is potentially the way such things are supposed to work, at least according to its supporters.

The general NAN (Neighborhood-Aware-Networking) concept makes a lot of sense at one level, but as an Internet guy, it troubles me that they decided to split from the Internet and go a balkanized direction. To me, the neighborhood is interesting only as part of a larger Internet.

It also troubles me that WiFi Aware is a "certification program" rather than a real standard.

-----Original Message-----
From: "Toke Høiland-Jørgensen" <[email protected]>
Sent: Friday, February 2, 2018 10:11am
To: [email protected], [email protected]
Cc: "Toke Høiland-Jørgensen" <[email protected]>
Subject: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

Since we now have the convenient helper to do so, actually adjust the
TSQ pacing shift for packets going out over a WiFi interface. This
significantly improves throughput for locally-originated TCP
connections. The default pacing shift of 10 corresponds to ~1ms of
queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
1-hop throughput for ath9k by a factor of 3, whereas increasing it more
has diminishing returns.

Achieved throughput for different values of sk_pacing_shift (average of
5 iterations of 10-sec netperf runs to a host on the other side of the
WiFi hop):

sk_pacing_shift 10: 43.21 Mbps (pre-patch)
sk_pacing_shift 9: 78.17 Mbps
sk_pacing_shift 8: 123.94 Mbps
sk_pacing_shift 7: 128.31 Mbps

Latency for competing flows increases from ~3 ms to ~10 ms with this
change. This is about the same magnitude of queueing latency induced by
flows that are not originated on the WiFi device itself (and so are not
limited by TSQ).

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
net/mac80211/tx.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 25904af38839..69722504e3e1 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
if (!IS_ERR_OR_NULL(sta)) {
struct ieee80211_fast_tx *fast_tx;

+ /* We need a bit of data queued to build aggregates properly, so
+ * instruct the TCP stack to allow more than a single ms of data
+ * to be queued in the stack. The value is a bit-shift of 1
+ * second, so 8 is ~4ms of queued data. Only affects local TCP
+ * sockets.
+ */
+ sk_pacing_shift_update(skb->sk, 8);
+
fast_tx = rcu_dereference(sta->fast_tx);

if (fast_tx &&
--
2.16.0

_______________________________________________
Make-wifi-fast mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/make-wifi-fast

2018-02-02 20:13:38

by Arend Van Spriel

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

On 2/2/2018 5:55 PM, [email protected] wrote:
> I'm curious about the "WiFi Aware" initiative by the WiFi Alliance.
>
> Does LEDE and/or Linux support this protocol? I know gSupplicant is potentially the way such things are supposed to work, at least according to its supporters.
>
> The general NAN (Neighborhood-Aware-Networking) concept makes a lot of sense at one level, but as an Internet guy, it troubles me that they decided to split from the Internet and go a balkanized direction. To me, the neighborhood is interesting only as part of a larger Internet.
>
> It also troubles me that WiFi Aware is a "certification program" rather than a real standard.

It troubles me that you are breaking into an email conversation with a
topic that in my opinion is totally unrelated. Although probably not
intended as such it seems rude. Just start your own conversation.

Regards,
Arend

> -----Original Message-----
> From: "Toke Høiland-Jørgensen" <[email protected]>
> Sent: Friday, February 2, 2018 10:11am
> To: [email protected], [email protected]
> Cc: "Toke Høiland-Jørgensen" <[email protected]>
> Subject: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift
>
> Since we now have the convenient helper to do so, actually adjust the
> TSQ pacing shift for packets going out over a WiFi interface. This
> significantly improves throughput for locally-originated TCP
> connections. The default pacing shift of 10 corresponds to ~1ms of
> queued packet data. Adjusting this to a shift of 8 (i.e. ~4ms) improves
> 1-hop throughput for ath9k by a factor of 3, whereas increasing it more
> has diminishing returns.
>
> Achieved throughput for different values of sk_pacing_shift (average of
> 5 iterations of 10-sec netperf runs to a host on the other side of the
> WiFi hop):
>
> sk_pacing_shift 10: 43.21 Mbps (pre-patch)
> sk_pacing_shift 9: 78.17 Mbps
> sk_pacing_shift 8: 123.94 Mbps
> sk_pacing_shift 7: 128.31 Mbps
>
> Latency for competing flows increases from ~3 ms to ~10 ms with this
> change. This is about the same magnitude of queueing latency induced by
> flows that are not originated on the WiFi device itself (and so are not
> limited by TSQ).
>
> Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
> ---
> net/mac80211/tx.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 25904af38839..69722504e3e1 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
> if (!IS_ERR_OR_NULL(sta)) {
> struct ieee80211_fast_tx *fast_tx;
>
> + /* We need a bit of data queued to build aggregates properly, so
> + * instruct the TCP stack to allow more than a single ms of data
> + * to be queued in the stack. The value is a bit-shift of 1
> + * second, so 8 is ~4ms of queued data. Only affects local TCP
> + * sockets.
> + */
> + sk_pacing_shift_update(skb->sk, 8);
> +
> fast_tx = rcu_dereference(sta->fast_tx);
>
> if (fast_tx &&
>

2018-02-14 08:18:46

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH] mac80211: Adjust TSQ pacing shift



On 14 February 2018 01:43:25 CET, Ryan Hsu <ryanhsu@qti=2Equalcomm=2Ecom> =
wrote:
>On 02/02/2018 07:11 AM, Toke H=C3=B8iland-J=C3=B8rgensen wrote:
>
>> Since we now have the convenient helper to do so, actually adjust the
>> TSQ pacing shift for packets going out over a WiFi interface=2E This
>> significantly improves throughput for locally-originated TCP
>> connections=2E The default pacing shift of 10 corresponds to ~1ms of
>> queued packet data=2E Adjusting this to a shift of 8 (i=2Ee=2E ~4ms)
>improves
>> 1-hop throughput for ath9k by a factor of 3, whereas increasing it
>more
>> has diminishing returns=2E
>>
>> Achieved throughput for different values of sk_pacing_shift (average
>of
>> 5 iterations of 10-sec netperf runs to a host on the other side of
>the
>> WiFi hop):
>>
>> sk_pacing_shift 10: 43=2E21 Mbps (pre-patch)
>> sk_pacing_shift 9: 78=2E17 Mbps
>> sk_pacing_shift 8: 123=2E94 Mbps
>> sk_pacing_shift 7: 128=2E31 Mbps
>>
>> Latency for competing flows increases from ~3 ms to ~10 ms with this
>> change=2E This is about the same magnitude of queueing latency induced
>by
>> flows that are not originated on the WiFi device itself (and so are
>not
>> limited by TSQ)=2E
>>
>> Signed-off-by: Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=2Edk>
>> ---
>> net/mac80211/tx=2Ec | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/net/mac80211/tx=2Ec b/net/mac80211/tx=2Ec
>> index 25904af38839=2E=2E69722504e3e1 100644
>> --- a/net/mac80211/tx=2Ec
>> +++ b/net/mac80211/tx=2Ec
>> @@ -3574,6 +3574,14 @@ void __ieee80211_subif_start_xmit(struct
>sk_buff *skb,
>> if (!IS_ERR_OR_NULL(sta)) {
>> struct ieee80211_fast_tx *fast_tx;
>> =20
>> + /* We need a bit of data queued to build aggregates properly, so
>> + * instruct the TCP stack to allow more than a single ms of data
>> + * to be queued in the stack=2E The value is a bit-shift of 1
>> + * second, so 8 is ~4ms of queued data=2E Only affects local TCP
>> + * sockets=2E
>> + */
>> + sk_pacing_shift_update(skb->sk, 8);
>> +
>> fast_tx =3D rcu_dereference(sta->fast_tx);
>> =20
>> if (fast_tx &&
>
>I knew increasing the value doesn't help much after 8 for ath9k, but I
>ran a
>testing on ath10k that 6 or 7 is having optimal number=2E
>Since ath10k/11ac device has higher bandwidth than ath9k/11n, can we
>consider
>to use to 6 or 7 to accommodate that effect?
>
> tx (mbps) cpu usage (%)
>5 404 28=2E5
>6 398 13=2E8
>7 401 8
>8 378 5
>9 230 4=2E5
>10 79=2E6 2

Why does the CPU usage go up >7? Also, what is the latency impact of each =
of those values?

-Toke

2018-02-14 00:43:50

by Ryan Hsu

[permalink] [raw]
Subject: Re: [PATCH] mac80211: Adjust TSQ pacing shift

T24gMDIvMDIvMjAxOCAwNzoxMSBBTSwgVG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2VuIHdyb3RlOg0K
DQo+IFNpbmNlIHdlIG5vdyBoYXZlIHRoZSBjb252ZW5pZW50IGhlbHBlciB0byBkbyBzbywgYWN0
dWFsbHkgYWRqdXN0IHRoZQ0KPiBUU1EgcGFjaW5nIHNoaWZ0IGZvciBwYWNrZXRzIGdvaW5nIG91
dCBvdmVyIGEgV2lGaSBpbnRlcmZhY2UuIFRoaXMNCj4gc2lnbmlmaWNhbnRseSBpbXByb3ZlcyB0
aHJvdWdocHV0IGZvciBsb2NhbGx5LW9yaWdpbmF0ZWQgVENQDQo+IGNvbm5lY3Rpb25zLiBUaGUg
ZGVmYXVsdCBwYWNpbmcgc2hpZnQgb2YgMTAgY29ycmVzcG9uZHMgdG8gfjFtcyBvZg0KPiBxdWV1
ZWQgcGFja2V0IGRhdGEuIEFkanVzdGluZyB0aGlzIHRvIGEgc2hpZnQgb2YgOCAoaS5lLiB+NG1z
KSBpbXByb3Zlcw0KPiAxLWhvcCB0aHJvdWdocHV0IGZvciBhdGg5ayBieSBhIGZhY3RvciBvZiAz
LCB3aGVyZWFzIGluY3JlYXNpbmcgaXQgbW9yZQ0KPiBoYXMgZGltaW5pc2hpbmcgcmV0dXJucy4N
Cj4NCj4gQWNoaWV2ZWQgdGhyb3VnaHB1dCBmb3IgZGlmZmVyZW50IHZhbHVlcyBvZiBza19wYWNp
bmdfc2hpZnQgKGF2ZXJhZ2Ugb2YNCj4gNSBpdGVyYXRpb25zIG9mIDEwLXNlYyBuZXRwZXJmIHJ1
bnMgdG8gYSBob3N0IG9uIHRoZSBvdGhlciBzaWRlIG9mIHRoZQ0KPiBXaUZpIGhvcCk6DQo+DQo+
IHNrX3BhY2luZ19zaGlmdCAxMDogIDQzLjIxIE1icHMgKHByZS1wYXRjaCkNCj4gc2tfcGFjaW5n
X3NoaWZ0ICA5OiAgNzguMTcgTWJwcw0KPiBza19wYWNpbmdfc2hpZnQgIDg6IDEyMy45NCBNYnBz
DQo+IHNrX3BhY2luZ19zaGlmdCAgNzogMTI4LjMxIE1icHMNCj4NCj4gTGF0ZW5jeSBmb3IgY29t
cGV0aW5nIGZsb3dzIGluY3JlYXNlcyBmcm9tIH4zIG1zIHRvIH4xMCBtcyB3aXRoIHRoaXMNCj4g
Y2hhbmdlLiBUaGlzIGlzIGFib3V0IHRoZSBzYW1lIG1hZ25pdHVkZSBvZiBxdWV1ZWluZyBsYXRl
bmN5IGluZHVjZWQgYnkNCj4gZmxvd3MgdGhhdCBhcmUgbm90IG9yaWdpbmF0ZWQgb24gdGhlIFdp
RmkgZGV2aWNlIGl0c2VsZiAoYW5kIHNvIGFyZSBub3QNCj4gbGltaXRlZCBieSBUU1EpLg0KPg0K
PiBTaWduZWQtb2ZmLWJ5OiBUb2tlIEjDuGlsYW5kLUrDuHJnZW5zZW4gPHRva2VAdG9rZS5kaz4N
Cj4gLS0tDQo+ICBuZXQvbWFjODAyMTEvdHguYyB8IDggKysrKysrKysNCj4gIDEgZmlsZSBjaGFu
Z2VkLCA4IGluc2VydGlvbnMoKykNCj4NCj4gZGlmZiAtLWdpdCBhL25ldC9tYWM4MDIxMS90eC5j
IGIvbmV0L21hYzgwMjExL3R4LmMNCj4gaW5kZXggMjU5MDRhZjM4ODM5Li42OTcyMjUwNGUzZTEg
MTAwNjQ0DQo+IC0tLSBhL25ldC9tYWM4MDIxMS90eC5jDQo+ICsrKyBiL25ldC9tYWM4MDIxMS90
eC5jDQo+IEBAIC0zNTc0LDYgKzM1NzQsMTQgQEAgdm9pZCBfX2llZWU4MDIxMV9zdWJpZl9zdGFy
dF94bWl0KHN0cnVjdCBza19idWZmICpza2IsDQo+ICAJaWYgKCFJU19FUlJfT1JfTlVMTChzdGEp
KSB7DQo+ICAJCXN0cnVjdCBpZWVlODAyMTFfZmFzdF90eCAqZmFzdF90eDsNCj4gIA0KPiArCQkv
KiBXZSBuZWVkIGEgYml0IG9mIGRhdGEgcXVldWVkIHRvIGJ1aWxkIGFnZ3JlZ2F0ZXMgcHJvcGVy
bHksIHNvDQo+ICsJCSAqIGluc3RydWN0IHRoZSBUQ1Agc3RhY2sgdG8gYWxsb3cgbW9yZSB0aGFu
IGEgc2luZ2xlIG1zIG9mIGRhdGENCj4gKwkJICogdG8gYmUgcXVldWVkIGluIHRoZSBzdGFjay4g
VGhlIHZhbHVlIGlzIGEgYml0LXNoaWZ0IG9mIDENCj4gKwkJICogc2Vjb25kLCBzbyA4IGlzIH40
bXMgb2YgcXVldWVkIGRhdGEuIE9ubHkgYWZmZWN0cyBsb2NhbCBUQ1ANCj4gKwkJICogc29ja2V0
cy4NCj4gKwkJICovDQo+ICsJCXNrX3BhY2luZ19zaGlmdF91cGRhdGUoc2tiLT5zaywgOCk7DQo+
ICsNCj4gIAkJZmFzdF90eCA9IHJjdV9kZXJlZmVyZW5jZShzdGEtPmZhc3RfdHgpOw0KPiAgDQo+
ICAJCWlmIChmYXN0X3R4ICYmDQoNCkkga25ldyBpbmNyZWFzaW5nIHRoZSB2YWx1ZSBkb2Vzbid0
IGhlbHAgbXVjaCBhZnRlciA4IGZvciBhdGg5aywgYnV0IEkgcmFuIGENCnRlc3Rpbmcgb24gYXRo
MTBrIHRoYXQgNiBvciA3IGlzIGhhdmluZyBvcHRpbWFsIG51bWJlci4NClNpbmNlIGF0aDEway8x
MWFjIGRldmljZSBoYXMgaGlnaGVyIGJhbmR3aWR0aCB0aGFuIGF0aDlrLzExbiwgY2FuIHdlIGNv
bnNpZGVyDQp0byB1c2UgdG8gNiBvciA3IHRvIGFjY29tbW9kYXRlIHRoYXQgZWZmZWN0Pw0KDQog
ICB0eCAobWJwcykgY3B1IHVzYWdlICglKQ0KNSAgICA0MDQgICAgICAgMjguNQ0KNiAgICAzOTgg
ICAgICAgMTMuOA0KNyAgICA0MDEgICAgICAgIDgNCjggICAgMzc4ICAgICAgICA1DQo5ICAgIDIz
MCAgICAgICAgNC41DQoxMCAgIDc5LjYgICAgICAgMg0KDQpJIGhhdmUgYSBxdWFkIGNvcmUgbWFj
aGluZS4NCg0KJCBjYXQgL3Byb2MvY3B1aW5mbyANCnByb2Nlc3Nvcgk6IDANCnZlbmRvcl9pZAk6
IEdlbnVpbmVJbnRlbA0KY3B1IGZhbWlseQk6IDYNCm1vZGVsCQk6IDU4DQptb2RlbCBuYW1lCTog
SW50ZWwoUikgQ29yZShUTSkgaTUtMzM4ME0gQ1BVIEAgMi45MEdIeg0KDQotLSANClJ5YW4gSHN1
DQo=

2018-02-14 08:23:45

by Jonathan Morton

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

> On 14 Feb, 2018, at 10:18 am, Toke H=C3=B8iland-J=C3=B8rgensen =
<[email protected]> wrote:
>=20
> Why does the CPU usage go up >7?

Just as a guess, it's generating extra packets which are then =
laboriously discarded and retransmitted.

- Jonathan Morton

2018-03-02 01:10:04

by Ryan Hsu

[permalink] [raw]
Subject: Re: [Make-wifi-fast] [PATCH] mac80211: Adjust TSQ pacing shift

On 02/14/2018 12:23 AM, Jonathan Morton wrote:

>> On 14 Feb, 2018, at 10:18 am, Toke Høiland-Jørgensen <[email protected]> wrote:
>>
>> Why does the CPU usage go up >7?
> Just as a guess, it's generating extra packets which are then laboriously discarded and retransmitted.
>
> - Jonathan Morton

I think for 11n, like ath9k, it might be good enough for 8, but for 11ac could
aggregate a little more.

Yes, and CPU usage goes up after 6 or 7, might due to it generates extra
packets but the physical bus is capping the throughput, so that we can't see
much throughput difference after (or maybe my setup is not optimal, assumed we
should be seeing around 550-600Mbps for TCP in 11ac), but only the CPU usage
increased.

--
Ryan Hsu